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А NEW editor must always be Janus-like, looking back at the history of the Journal Ф 7 7 7 
and forward to its future. On Behalf of the editorial board and the readers I must 
thank Professor Geoff Brown for the time and effort he has devoted to it, first as 
assistant editor and then as editor. He has served as latest in a long line of 
distinguished editors and his going leaves me with hard-acts to follow. 


Previous editors have commented.on the strength and stability of the Journal, 
features which mitigate against sudden change. Certainly taking over the editorship 
and preparing this issue has been like taking the controls of a moving vehicle with a 
momentum of its own. The first task has been to find the steering and adjust to the 
speed, but future tasks will have to do with direction and with maintenance of the 
traditional high quality of research publication. In. these I know I shall be 
enormously indebted to the team of assistant editors and to our sub-editor. But most 
of all the Journal will continue to depend on its contributors and readers. 


Looking back it seems to me that there has always been a strong interest in 
certain kinds of research. The development, calibration and testing of measurement 
instruments is one such concern. Another is the adoption of theoretical concepts 
from main-stream psychology and the exploration of their nature and relationships 
through statistical modelling based on data from educational contexts. Both kinds 
of work have become more sophisticated statistically over time, as evidenced by 
papers in this issue. A third area of work has been developmental psychology, much 
but not all being influenced by the work of Piaget and by reactions to it. I hope that 
in spite of the advent of new journals in the cognitive and developmental fields there 
will not be a diminution in those contributions to this journal which are of particular 
relevance to educational problems and practice. 


Two further kinds of article are welcome. Research which aims to illuminate 
learning and instructional processes and to give teachers and psychologists informed 
options for practice is central to educational psychology. It may adopt the objective, 
Statistical stance towards the individual which has underlain differential and 
experimental psychology; but it may also take more person and process-oriented 
stances, acknowledging the individuality of the learner or teacher with respect to his 
or her intentions, understandings and knowledge, and attempting to be descriptively 
faithful to the phenomenon of interest. Either way, for a well-grounded educational 
psychology, researchers must engage with the triple ajliance of learner, learning and 
what is learned. Perhaps in the past we have paid too little attention to the last of 
these, but I think that in recent years we have come more readily to acknowledge and 
try to understand the importance of the social and instructional contexts of learning. 
Instead of seeking the great generalisation we now look to defining the limits within 
which relationships hold and to understanding why such limits must be recognised. 
Since such work is still relatively undeveloped and educational contexts change over 
time we should be able to look forward to many interesting and useful articles. 


Most of the papers submitted for publication are reports of empirical research, 
but competent analyses of theory and method are equally welcome. Although the 
Journal does not aim to publish review articles it seems to me there is a place for 
consideration of past publications on particular research topics. Both readers and 
contributors could benefit from such help towards knowing how their own journal 
has contributed to the field. 


I have been very heartened to see that the Journal receives a good number of 
papers from overseas authors. If ways can be found to overcome translation 
problems I should like to see more from those for whom English is not a first or even 
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a second language. At the recent conference in Leuven of the newly formed 
European Association for Research in Learning and Instruction (EARLI) it was 
clear that there could be real benefit from such contributions. Unfortunately 
journals cannot provide translation services, but international co-operation in 
authorship can achieve much, and editors are sympathetic when considerable effort 
has already gone into a worthwhile paper. 


In these remarks I have most certainly omitted mention of particular research 
interests. There is no intention, however, to exclude any paper which is clearly of 
interest either conceptually or methodologically to an international readership of 
educationalists and psychologists. The Journal has always accepted well-based and 
well-argued research. I hope to serve it in an open and unbiased way, and to deserve 
the trust placed in me both by the editorial board and by our contributors. 


Haze FRANCIS 
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THE INFLUENCE OF PATERNAL SOCIAL CLASS 
ON INTELLIGENCE AND EDUCATIONAL LEVEL 
IN MALE ADOPTEES AND NON-ADOPTEES 


EE By T. W. TEASDALE 
б (Psykologisk Institut Kommunehospitalet, Copenhagen) 


AND DAVID К. OWEN 
(Department of Psychology, Brooklyn College, New York) 


Summary Intelligence test scores and educational level of 241 18-26 year-old male 
adoptee$ correlate with the social class of both their biological fathers (г = 0.218, 
P < 0.01; andr = 0.173, P « 0.05) and their adoptive fathers (т = 0.216, P < 0.01; 
andr = 0.359, Р < 0.001). The correlation between the biological father’s social class and 
the adoptee’s educational level may be largely mediated by the inheritance of 
characteristics more directly measured by the intelligence test. Conversely, the correlation 
between the adoptive father’s social class and the adoptee’s intelligence test score may be 
largely mediated by familial environmental influences on educational level. The multiple 
correlations of both biological and adoptive fathers’ social classes with the adoptees’ 
intelligence test scores (R = 0.292, P < 0.001) and with adoptees’ educational level 
(R = 0.384, P < 0.001) do not differ significantly (z < 0.6, NS) from the simple correla- 
tions of fathers’ social class with sons’ intelligence test scores and educational level 
in a non-adoptive control sample of 4,020 father-son pairs (r = 0.320, P < 0.001; and 
r = 0.417, P < 0.001). This suggests that familial influences on intelligence test scores and 
educational level may be adequately accounted for by an additive model including both 
genetic and familial environmental components. 


INTRODUCTION 


Tue evidence that familial members resemble each other in intelligence, as measured 
by IQ tests, may be one of psychology’s most firmly established empirical findings 
(Bouchard and McGue, 1981). Family members, however, share a degree of both 
common genes and a common environment, and the relative contributions of 
heredity and milieu in producing the observed similarities has been an issue of 
debate for more than six decades (Block and Dworkin, 1976; Eysenck and Kamin, 
1981; Stott, 1983). 


The study of adoptees has always occupied a central position in this debate. 
Adoptees inherit their genes from one set of parents and are provided with a rearing 
environment by another set of parents. It has therefore been reasoned that their 
degree of similarity to the former, their biological parents, should provide 
unequivocal evidence concerning genetic influences, whereas their similarity to the 
latter, their adoptive parents, should provide corresponding evidence concerning 
familial environmental effects (Munsinger, 1975). However, the pioneering studies 
of adoptees (e.g., Burks, 1928; Leahy, 1935; Skodak and Skeels, 1949), most of 
which were long regarded as having provided consistent evidence for an important 
genetic contribution to intelligence, have been called into question on various 
methodological grounds (Kamin, 1974), and the need for fresh data has become 
clear. Only comparatively recently have large-scale studies of intelligence in 
adoptees begun to reappear (Scarr and Weinberg, 1978; Horn et al., 1979; DeFries 
et al., 1981). In general, the results continue to implicate some genetic contribution 
to intelligence (Loehlin, 1980). 


The controversy regarding the determinants of intelligence has undoubtedly 
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been fuelled by the strong relationship between IQ and educational level, since 
educational level itself is so decisive a determinant of other dimensions of consider- 
able practical significance, for example, income and social status (Jencks, 1972). 
Despite this fact, the role of heredity and familial environment in educational level 
itself has been much less directly studied — probably for no better reason than that 
adoptees are usually studied as children before they have attained their final 
educational level. Recent evidence from a large-scale twin study, however, has 
suggested an important role for heredity in educational level (Heath ef al., 1985). 


We have earlier reported on a study of both intelligence and educational level in 
two types of young adult adoptee siblings (Teasdale and Owen, 1984a). Sibling pairs 
who were genetically related but reared separately resembled each other more in 
intelligence than in educational level, while the reverse was true of sibling pairs who 
were genetically unrelated but reared together. 


These findings suggest hypotheses which go beyond simplistic statements 
about, for instance, heritabilities, and concern rather the nature of the reciprocal 
influences of intelligence and educational level upon each other (Snow and Yalow, 
1982). Thus, if heredity is of more importance in intelligence than in educational 
level, then genetic variance for intelligence might itself be the major genetic 
contribution to educational level because, in part, the educational system selects 
from variation in intelligence. Correspondingly, if the familial environment has 
more impact upon educational level than upon intelligence, then familial environ- 
mental contributions to variation in educational level might be the main source of 
familial environmental contributions to intelligence since, in part, variation in 
educational level produces variation in intelligence. 


The present study reports correlations between the intelligence and educational 
level of the adoptees and the social classes of their biological and adoptive fathers. 
These correlations provide an opportunity to confirm the premises for the above 
hypotheses. Through a genetic contribution, the social class of the biological father 
could be expected to correlate more highly with his son’s intelligence than with his 
educational level, whereas conversely, through a familial environmental contribu- 
tion, the social class of the adoptive father could be expected to correlate more 
highly with his son’s educational level than with his intelligence. In order to examine 
the degree to which adoptees and their biological and adoptive fathers are 
representative of the general population, and thereby the generalisability of findings 
derived from them, we have also included data on offspring intelligence and educa- 
tional level and paternal social class in non-adoptive families. 


METHOD 
Subjects 
The sample of adoptees used in this study derives from a register of all 14,427 
non-familial adoptions granted in Denmark between 1924 and 1947 (Teasdale, 
1979). For each adoption, there is a record identifying the name, date of birth, and 
birthplace for both sets of parents and the adoptee. Also usually noted is the age of 
the adoptee at transfer to the adoptive home and the occupations of the biological 
and adoptive fathers at the time of adoption. Occupations were available for fewer 
than 40 per cent of the biological mothers and for almost none of the adoptive 
mothers, and these are therefore not used here. 


The adoptees in this study are siblings and, although the present report is 
concerned with parent-offspring resemblance rather than the already reported 
resemblance between siblings (Teasdale and Owen, 1984a), the nature of the sibling 
relationships needs to be explained. 
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Among the 14,427 adoptees, 2,948 were related by virtue of common genetic 
and/or adoptive parentage (Teasdale and Owen, 1981), the most usual of these 
relationships being biological full- and half-siblings separately adopted and reared 
by different adoptive parents, and genetically unrelated pairs of individuals adopted 
and reared together by the same adoptive parents (i.e., adoptive siblings). Since the 
adoptee data in the present study stem from military draft board records dating 
from 1956 onwards, we were restricted to using biological and adoptive siblings who 
were male and born between 1938 and 1947. Of these, there were 33 pairs and 1 triad 
of biological full siblings (reared apart), 51 pairs and 1 triad of biological maternal 
half-siblings (reared apart), 36 pairs and 5 triads of biological paternal half-siblings 
(reared apart), and 27 pairs of genetically unrelated adoptive siblings (i.e., reared 
together). Because of overlap among the four groups, the total number of adoptees 
involved was 302 rather than 315; eight of the 13 overlaps arose from four instances 
in which a pair of biological full-siblings shared a paternal half-sibling. 


The 302 adoptees had 225 biological fathers, of whom 65 had adopted away 
two sons and six had adopted away three sons, and 275 adoptive fathers, of whom 
27 had adopted two sons. No two adoptees shared both the same biological and 
adoptive father. The median age of the adoptees at transfer to their adoptive homes 
was 5 months, and 86 per cent were transferred by two years of age. The geographic 
distribution of the adoptees throughout Denmark was approximately proportional 
to that of the general population. 


As a non-adoptive control group, we have used data on 4,578 males collected in 
an earlier study (Witkin et al., 1976). For reasons related to the design of that study, 
these control males are all at least 184 cm tall, they were born from 1944 through 
1947, and their births were registered in Copenhagen. In most cases, paternal 
occupation at the time of birth was available from birth certificate records. It is 
quite reasonable to assume that the great majority of these non-adopted males were 
reared by the fathers named on the birth certificates. 


Materials 

Paternal social class. The occupations of the biological, adoptive, and control 
fathers were uniformly rated on an eight-point scale of social class (0 = low, e.g., 
porter; 7 = high, e.g., university professor). Details of the scale have been presented 
elsewhere (Teasdale, 1979). From the 302 adoption records, rateable occupations 
were available for 270 of the biological fathers and 300 of the adoptive fathers. Ina 
further seven cases where the adoption record did not specify an occupation for the 
biological father, his occupation was duplicated from another record in which he 





TABLE 1 
SOCIAL CLASS OF BIOLOGICAL AND ADOPTIVE FATHERS TO ADOPTEES AND FATHERS TO CONTROLS 
Social Class 
Father 0 1 2 3 4 5 6+7 Missing Mean SD 
Biological Father to Adoptee (N = 302) 
N 88 30 97 31 1 2 25 1-61 1:38 
% 31:8 108 35:0 11:2 10-1 0-4 0-7 
Adoptive Father to Adoptee (N = 302) 
N 41 32 62 46 76 29 14 2 2.76 1-70 
% 13:7 10:7 20-7 15:3 25:3 9:7 47 
Control Father (N = 4578) a 
N 577 316 1263 770 836 282 32 212 2-71 1-67 
_ Ф 13:2 T2 -290 17-6 19:1 6:5 7:4 
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appeared. Rateable occupations were available for 4,366 of the 4,578 control 
fathers. 


The distribution of social class ratings for the biological and adoptive fathers to 
the 302 adoptees, and for the fathers to the 4,578 controls, is shown in Table 1. 
Among the 65 biological fathers who appear twice in this table and the six who 
appear three times, 21 had different social class ratings on the different adoption 
records, but in only five cases did the ratings differ by more than two points. 
Correspondingly, of the 27 adoptive fathers who appear twice in the table, three had 
different social class ratings on the two adoption records, but in no case did they 
differ by more than two points. 


Adoptees’ and controls’ intelligence and educational level. These data were 
obtained from draft board records. There has been a national conscription in Denmark 
continuously since 1945, and all males are required to appear before a draft board 
for examination not earlier than the year in which they become 18 and not later than 
the year in which they become 26. The large majority do so at age 18 or 19. About 5 
per cent are exempted, most commonly because of a certified disqualifying illness — 
for example, epilepsy, diabetes, spinal disorders, and institutionalised mental 
retardation. 


Since 1956, and continuing to the present day, the examination has included a 
group intelligence test, Børge Prien's Prove (BPP). The BPP is a 45-minute paper- 
and-pencil test comprising four subtests: (a) a letter matrices test resembling Raven’s 
Progressive Matrices but with the matrix cells being patterns of alphabetic letters, 
and where the empty cell is to be filled in by the subject rather than selected from a 
number of forced-choice alternatives (19 items, 15 minutes); (b) a verbal analogies 
test in which the response is to be selected from a list of 100 words (24 items, 5 
minutes); (c) a number series test (17 items, 15 minutes); and (d) a geometric figures 
test in which the subject is required to indicate which of five simple geometric figures 
must be used to construct a complex figure (18 items, 10 minutes). Within each 
subtest the items are arranged in an empirically determined order of increasing 
difficulty. The test score is the total number of correct items out of 78; the subtest 
scores are not recorded. 


The BPP test was constructed in the early 1950s by Børge Prien in accordance 
with pioneering single parameter latent-trait principles developed by a Danish 
statistician, Georg Rasch (1980), principles which are now widely used and often 
referred to as **Rasch Models” (see Anastasi, 1982, p. 214, and Cronbach, 1984, p. 
117). There are no published data comparing the BPP to more universally known 
intelligence tests, but for 76 male subjects used in another study we obtained a 
within-group correlation of 0.77 between a BPP score and full scale IQ on the 
Wechsler Adult Intelligence Test. 


The draft board also recorded educational level using a nine-point scale. To 
describe this scale it is necessary to give an outline of the Danish educational system. 
Through the years during which the subjects in the present study were attending 
school (1945-1965), education was compulsory in Denmark between the ages of 7 ° 
and 14. For the great majority, this was provided by seven years of public primary 
schools (Folkeskole). This minimum could be supplemented by either up to a further 
three years of Folkeskole or, dependent upon teacher recommendation or examina- 
tion success, by transfer to a rather higher standard three-year stream (Realskole) 
terminating in a public examination, Realeksamen. At the end of the second year of 
Realskole (i.e., one year prior to the Realeksamen), and again dependent upon 
teacher recommendation or examination success, a second transfer could be made to 
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an academically oriented three-year school (Gymnasium) leading to a university 
matriculation examination (Studentereksamen). 


In addition to this school system, numerous forms of further education were 
available, varying in prerequisites and standards. These ranged from, at the lowest 
levels, day-release or night-school courses for further general and practical 
education without examinations; through, at intermediate levels, apprenticeships 
and technical, agricultural and commercial training; to, at the highest levels, non- 
academic professional training (e.g., school teachers and engineers) and university 
education. 


The nine-point scale used by the draft board was constructed to rank 
educational level giving, where necessary, equal weight to comparable qualifications 
in different areas. The following is a simplified description of each level of the scale, 
a modified version of which has been used in a large-scale Danish study of social 
class Сеа ои 1959): 

seventh grade of Folkeskole only, with no further education; 
2 seventh grade of Folkeskole supplemented by а non-examined general ог 
practical education course, 
or 
eighth grade of Folkeskole; 
3. eighth grade of Folkeskole supplemented as for 2 above 
or 
ninth grade or tenth grade of Folkeskole; 
completed trade apprenticeship; 
first or second Realskole grade (i.e., Realskole without attaining the 
Realeksamen); 
6. Realeksamen 

or 

some equivalent forms of examined commercial and technical education 

or 

first or second Gymnasium school grade (i.e., Gymnasium school without 

attaining the Studentereksamen); 

7. various forms of non-academic professional training, most of which have 
the Realeksamen or some equivalent as a prerequisite (e.g., teaching, 
engineering); 

Studentereksamen; 

university education. 


np 


о оо 


TABLE 2 
EDUCATIONAL LEVEL AND INTELLIGENCE TEST SCORE OF SONS (ADOPTEES AND CONTROLS) 


Educational Level 
1 2 3 4 5 6 7 8 9 Missing Mean SD 





Adoptee t = 302) 
N 35 17 80 21 49 li 12 3 39 4-09 2°00 
% Es ^ 13:3 6:4 30:4 8-0 18:6 42 4-6 1-1 

Control (N = 4578) 
N 171 204 121 1192 281 999 190 790 239 391 5-47 2-07 
% 41 49 29 285 6:7 23:9 45 18:9 5:7 





Intelligence Test Score (BPP) Missing Mean 5р 


Adoptee (N = 302) 39 34-5 11:6 
Control (N = 4578) 391 43:2 11:7 
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BPP and educational level data were traced in the draft board archives for 263 
of the 302 adoptees. The remaining 39 adoptees comprise 20 cases for whom a 
record was found indicating that the adoptee had been exempted from appearing 
before the board (one because of mental retardation) and 19 cases for whom no 
record could be found. A similar tracing yielded BPP and educational level data for 
4,187 of the 4,578 non-adoptee controls. Mean levels for both groups, together with 
frequency distributions of educational level, are shown in Table 2. 


RESULTS 


The biological fathers have a significantly lower mean social class than the 
control fathers (t(4641) = 10.73, P < 0.001) and have a significantly smaller 
variance (Fma (276, 4365) = 1.47, P < 0.001). The social class of the adoptive 
fathers does not differ significantly from that of the control fathers in either mean 
(t(4644) = 0.49, NS) or variance (Fmax (299, 4365) = 1.04, NS). The adoptees them- 
selves are significantly lower than the controls both in mean BPP (t(4449) = 11.70, 
P < 0.001) and educational level (t(4449) = 10.51, P < 0.001). There are, however, 
no significant differences in variances between the two groups (for Educational 
Level, Fmax (262, 4186) = 1.07, NS; for BPP, Fix (262, 4186) = 1.02, NS). 


Correlational analyses have been restricted to the 241 adoptive and 4,020 
control sets for whom complete data (i.e., paternal social class and adoptee and 
control BPP and educational level) are available. Among the 241 adoptions, the 
correlation between the social classes of the biological and adoptive fathers is 0.101 
(P < 0.05). This suggests that the adoption agencies have, to some extent, selected, 
for the adoptees, adoptive parents who bear some resemblance to the biological 
parents. The correlations between the fathers’ social class and their sons’ BPP and 
educational level are shown in Table 3. All of these correlations are significant 
(P < 0.05). The correlation between BPP and educational level is 0.670 for the 
adoptees and 0.720 for the controls. These correlations do not differ significantly 
(z = 1.46, NS). 





TABLE 3 
CORRELATIONS BETWEEN FATHERS’ SOCIAL CLASS AND SONS’ EDUCATIONAL LEVEL AND INTELLIGENCE 
Test SCORES 
Sons 
Father Social Class Educational Level Intelligence (BPP) 
Adoptees (N = 241) 
Biological Father 0-173* 0:218** 
Adoptive Father 0-359*** 0-216** 
Controls (N = 4020) 
Control Father 0-417*** 0-320*** 


* P < 0-05. * P«0 0l. *** P < 0.001. 


In order to compare the relative magnitude of paternal contributions to 
adoptees’ BPP and educational level, respectively, we have employed a statistic 
proposed by Williams (1959) and cited, as T,, by Steiger (1980). This statistic 
permits the comparison of two correlations sharing a common variable. The 
correlation of the adoptive fathers’ social class with the adoptees’ educational level 
(0.359) is, as predicted, significantly greater than the correlation with adoptees’ BPP 
(0.216; T, (238) = 2.91, P< 0.01). The partial correlation of adoptive fathers’ social 
class with the adoptees’ BPP, controlling for adoptees’ educational level, is — 0.04 
(NS). Conversely, the correlations between the biological fathers’ social class and 
the adoptees’ BPP and educational level (0.218 and 0.173, respectively) are not 
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significantly different (T, (238) = 0.88, P > 0.1). Yet it is interesting to note that the 
direction of the difference, namely that the biological fathers’ social class better 

redicts the adoptees’ BPP than their educational level, is, consistent with our 
hypothesis. the reverse of that for the adoptive fathers. The partial correlation of the 
biological fathers’ social class with the adoptees’ educational level, controlling for 
BPP, was 0.04 (NS). 


The multiple correlation of both biological and adoptive fathers’ social class 
with adoptees’ BPP is 0.292 (P < 0.001) and with adoptees’ educational level is 0.384 
(P < 0.001). The BPP multiple correlation does not differ significantly from the 
correlation between the control fathers’ social class and the controls’ BPP, 
г = 0.320 (2 = 0.46, NS). Similarly, the educational level multiple correlation does 
not differ significantly from the correlation between the control fathers’ social class 
and the controls’ educational level, r = 0.417 (z = 0.59, NS). 


DISCUSSION 


That the social class of biological fathers of the adoptees should lie below that of 
the control fathers confirms an earlier finding (Teasdale and Owen, 1984b) and is not 
surprising given the usual circumstances surrounding adoptions. That the social 
class of the adoptive fathers should not be greater than that of the control fathers, 
and that the adoptees should be significantly below the non-adoptee controls in both 
intelligence and educational level, are both unexpected findings. We have earlier 
found adoptive fathers to have a higher mean social class, and adoptees to have the 
same mean social class, as the general population (Teasdale and Owen, 1984b). The 
present discrepancies are probably attributable to the special nature of the control 
sample which was selected for height (at least 184 cm), was almost exclusively 
metropolitan rather than covering the entire country, and was born on average later 
(1944-1947) than the adoptees (1938-1947). 


Of more importance in the present context is, however, that with one exception, 
the variances do not differ significantly on either parental or filial measures between 
the adoptive and control groups. Restricted variance has otherwise frequently been 
observed in adoption studies (Loehlin, 1980) and renders interpretation and 
generalisation difficult since it attenuates parent-offspring correlations. The single 
exception in our data is that the variance of social class amongst the biological 
fathers appears to be restricted and we may thus be underestimating the degree of 
genetic contributions. 


The correlation between the social classes of the biological and adoptive fathers 
is modest but significant and indicates that the adoption agencies had, to a degree, 
placed the children of relatively high social class biological fathers into homes where 
the adoptive father was also of high social status. The crucial question is, to what 
degree? It has repeatedly been pointed out that a correlation between genetically 
related individuals who have experienced separate environments does not necessitate 
the inference of a genetic component unless the environments are uncorrelated (e.g., 
Lewontin, 1975). There had been considerable discussion of how far this critical 
requisite has been met in the major adoption studies, including studies of separately 
reared monozygotic twins, conducted hitherto (Kamin, 1974; Fulker, 1975; Taylor, 
1980; Farber, 1981; Bouchard, 1984). 


Most of the adoptions in the present study (at least 70 per cent) were arranged 
by one agency, the ‘‘Mother’s Aid Society” (M¢drehjalpen), and we have consulted 
three of its leading officials of the 1940s (including the Director), independently, 
concerning how adoptions at that time were arranged in practice. The society 
subscribed to the view that intelligence might be to some degree heritable, and some 
effort was therefore made, in the interests of the child, to match the biological 
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parents with the adoptive, in intellectual ability. However, the scope of 
opportunities to do so was restricted in two ways. First, the information on which to 
base such matching was limited; second, the ability to utilise such information as 
was available was also limited. 


The biological mother wishing to adopt away her child typically contacted the 
society during pregnancy. At her first visit toa society office, she was interviewed by 
a social worker. The main purpose of this interview was to present very clearly to the 
pregnant woman the full legal and social implications of adopting away, or 
retaining, her child; the decision being, of course, hers alone. She was also asked 
about her own background and education and, as far as she was able to report it, 
that of the biological father. In this way, his occupation would almost always have 
been known to the society. The biological mother was not given any form of 
intelligence test during the interview. Granted her permission, which was usually 
forthcoming, information on her academic abilities, in the form of a brief question- 
naire, was sought from the last school she attended. This same information was also 
sought on the biological father, conditional on his permission. It is estimated that 
only about one third of the biological fathers gave such permission, this being the 
approximate proportion who were directly in contact with a society official. 
However, the mean age of the biological mothers in the present study was 23 years at 
the time of the birth of the adoptee, and the biological fathers were 25. Thus, on 
average, almost a decade had elapsed since they left school. After such an interval, 
the quality of the information obtained was inconsistent and often of rather limited 
value. 


The adoptive parents were asked about their educational level during an 
interview, but the information they gave was not verified. They were not given any 
form of intelligence test. The adoptive father’s occupation and income were, 
however, verified by reference to income tax authorities. Their home was also 
inspected by a social worker. It is noteworthy that only rarely, and somewhat 
against policy, did the same social worker interview both the biological mother and 
the adoptive parents. 


Even given the requisite information, the ability of the society to match 
biological and adoptive parents on characteristics related to intelligence was still 
limited by several further constraints. During the period of relevance to this study, 
the society was organised into seven regions covering the entire country, and all 
adoptions were arranged within regions — the regions were later integrated with a 
specific objective, amongst others, of pooling all potential biological and adoptive 
parents in order to facilitate matching. Matching for intellectual ability had also 
usually to compete with other critiera, especially physical, for example, height, eye 
and hair colour, as well as other considerations, such as the adoptive parents’ 
preferences regarding the sex of the child and the desire on the part of the society to 
ensure a reasonable geographical distance between the addresses of the biological 
mother and the adoptive parents. Finally, it should be noted that such matching as 
was attempted was not symmetrical. Although the child of biological parents of 
exceptional intellectual abilities would not deliberately be placed with low 
intellectual parents, the reverse was not necessarily true. Adoptive parents of high 
status and intelligence might well be given a child of relatively low intellectual 
origins if those parents were judged not to place undue emphasis on academic 
expectations for their adopted child. 


The adoption agencies other than the Mother's Aid Society operated 
independently, and, by reason of the small numbers involved at any one agency and 
their much more limited ability to collect the relevant information, they would have 
had still less opportunity to match parents on intellectual status. 
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The most reliable and available information on which the agencies could base a 
matching between the biological father and adoptive parents on characteristics of 
relevance to intelligence in the child would appear to be their respective occupations. 
On balance, therefore, it is our view that the modest correlation observed between 
the social classes of the biological and adoptive fathers in the present study is an 
accurate representation of the degree of matching for intellectual status which was, 
in reality, achieved. This being so, then the matching can account for only a small 
proportion of the correlations between the biological fathers and the adoptees, or, 
pari passu, between the adoptive fathers and the adoptees. Both sets of correlations, 
therefore, must seem indicative of genuine influences, genetic and familial environ- 
mental. 


Our hypotheses concerning the nature of the genetic and familial environmental 
contributions to intelligence and educational level are only partially supported by 
the present findings. As predicted, the social class of the adoptive father is more 
strongly related to the adoptee’s educational level than to his intelligence. Thus, the 
familial environmental effect on educational level may operate on other factors 
relevant to education than intelligence, for instance, motivational patterns and 
learning strategies (Biggs, 1978). Furthermore, much of the relationship between the 
adoptive fathers’ social class and the adoptees’ intelligence could be accounted for 
by assuming that it was the formal education itself which was the primary environ- 
mental influence upon intelligence. 


The social class of the biological fathers correlated more with the adoptees' 
intelligence than with their educational level, although the difference was not 
significant. It nonetheless remains tempting to speculate that the intelligence test is 
more closely sensitive to some form of genetic variance and that educational 
systems, in addition to influencing intelligence, also in part select for an intelligence 
and ability which manifests itself in educational situations — but which is not wholly 
determined by them. 


The multiple correlations of both biological and adoptive fathers' social class 
with the intelligence and educational levels of the adoptees lie very close to the 
corresponding simple correlations between control fathers and their sons. This 
implies that these characteristics in the adoptees are as well predicted by the additive 
contributions of both heredity and familial environment as intelligence and 
educational level in non-adoptees are predicted by the social class of their fathers. 
This is in marked contrast to an earlier finding that the social class of adoptees is 
much less well explained than that of non-adoptees (Teasdale and Owen, 1984b). It 
does, however, suggest that variance in intelligence and educational levels may be 
а оташу accounted for in terms of additive genetic and familial environmental 
effects. 


There is a very close agreement between the correlations we obtain for the non- 
adoptive controls and corresponding values reported by Jencks (1972, p. 322) from 
several large-scale studies. Thus, where we obtain a fathers’ social class with sons’ 
educational level correlation of 0.417, Jencks reports 0.420; similarly, our 
correlation of 0.320 between fathers’ social class with sons’ intelligence is matched 
by Jencks’ value of 0.314. Our present results might therefore have wide generalis- 
ability. 
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INTELLIGENCE, FACT OR ARTEFACT: 
ALTERNATIVE STRUCTURES FOR COGNITIVE ABILITIES 
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AND D. M. ROMNEY 
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Summary A major source of empirical support for the generally accepted common 
factor structure of intelligence is (exploratory) factor analysis. This excludes the 
possibility of many alternative models which can only be evaluated through the 
confirmatory methods of structural modelling. Using this approach in the analysis of 
Richmond Basic Skills Test data, we demonstrate that a different kind of model where 
one type of ability leads to and influences another (a simplex) not only fits the data, but is 
also consistent with theory in developmental psychology. An elaboration of the simplex, 
termed for convenience a ‘split simplex’, enables all the tests (verbal and mathematical) to 
be incorporated in one model. We conclude that the simplex and its variant should be 
considered as plausible alternatives to the classical model. 


INTRODUCTION 


THE concept of intelligence has been dominated in education by what Richardson 
and Bynner (1984) described as the ‘strength’ model. The differences between 
individuals with respect to their performance of cognitive tasks are considered to 
reside ultimately in a single cognitive factor, ‘intelligence’ which all of them are born 
with to a greater or lesser extent. 


Empirical support for this conceptualisation of cognitive ability goes back to 
Spearman (1904) who first developed the method of factor analysis to test Galton’s 
theory that a single common factor (g) and a specific factor (s) constituted cognitive 
ability as measured by educational tests. 


This model was subsequently elaborated by Burt (1941) and Vernon (1950) to 
include intermediate group factors so that intelligence was represented by a 
hierarchical structure. Opposition to the Spearman model came from Thurstone 
(1948) who proposed a multiple group factor theory of intelligence which involved 
several (correlated) primary factors instead of a single global factor and replaced 
(test) specific factors (s) by random measurement error. However, Thurstone’s 
group factors may be reduced to the same general factor by means of ‘higher order’ 
factor analysis (see Jensen, 1980, p. 215). Further alternatives are Guilford’s 
tridimensional model of the structure of intelligence (Guilford and Hoepfner, 1971), 
and Cattell’s two dimensional model (Cattell, 1971) recently elaborated by Horn 
(1980). 


Despite their differences all the models share the assumption that correlations 
between variables are produced by one or more common underlying factors. The 
limitations of this approach will be evident to anyone familiar with the development 
over the last decade of confirmatory factor analysis (Jéreskog and Sórbom, 1979) 
within the framework of structural equation modelling of covariance matrices. 
These methods allow the researcher to estimate and test any identified linear model 
against observed data, taking account of measurement error. The 'exploratory' 
techniques employed in the factor analysis of mental tests direct research to only one 
such model — factors representing underlying dimensions — and exclude alternative 
possibilities. As we see in the next section, these alternatives include ‘path’ models 
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suggesting causal ordering among the tests. In this paper we use the confirmatory 


techniques available in the LISREL program (Jéreskog and Sérbom, 1981) to 
investigate such structures for cognitive abilities. 


ALTERNATIVE MODELS OF ABILITY 


Figure 1 shows four alternative ways of placing a causal interpretation on the 
correlation between two tests of cognitive ability, T, and T,. In the absence of 


FIGURE ! 
ALTERNATIVE STRUCTURAL MODELS 


(a) (b) (c) (d) (е) 


Correlation Structural models 


(а) 
Note Residual effects omitted 





evidence to the contrary, all four models are equally valid explanations of the 
correlations, yet only one of them, model d, is entertained in the traditional factor 
analytic view of intelligence. The unidirectional and reciprocal causal effects 
represented by the path models b, c and e cannot be investigated by factor analytic 
methods. But in terms of developmental psychology considerations (Anastasi, 1983) 
these seem the more appropriate to adopt. Hunt (1961, 1969), for example, 
conceptuelises intelligence as an unfolding process, running from the early orienting 
reflex of the new born through recognition of the familiar to searching for and 
mastering of perceptual complexity. Piaget (1976) divides the cognitive development 
of the child into discrete qualitatively distinct stages which follow an invariant 
sequence with each succeeding stage incorporating and extending the accomplish- 
ments of the preceding stage. These conceptions suggest a flow of influence from 
one cognitive ability to the next with less complex skills leading on to more complex 
skills. Such a linear ordering (Models b and c) will be manifested in correlational 
data by what Guttman (1954) describes as a simplex structure. 


If a set of variables | to n sharing a single common factor in varying degrees are 
ordered in terms of the factor loadings (highest at the top, lowest at the bottom), 
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FIGURE 2 
FACTOR AND SIMPLEX CORRELATION STRUCTURE 


a) Structure of a correlation b) Structure of a correlation 
matrix with one factor. matrix showing a simplex. 
Variables Variables 
КК ОТКОН ЕК СҮТ 1 КЕТТЕ ТҮТТҮ 1 
п п 
Variables : Variables : 
1 1 
Notes 


1 Variables are ordered in terms of 
factor loadings with variable n 
having the highest factor loading. 


2 ———> = direction of decrease in 
size of correlations. 


then the correlations between them will be ordered as in Figure 2a: the correlations 
descend in size on the left hand side of the principal diagonal and increase in size on 
the right hand side of it. If, on the other hand, these variables form a simplex then 
the correlations will be ordered as in Figure 2b: the correlations decrease in size from 
the principal diagonal to the corners of the matrix. 


The important point to note is that the possibility of a simplex structure and 
consequently a linear ordering of the variables can never be revealed by exploratory 
factor analysis. The method will impose a common factor structure on the data 
whatever its nature. So engrained has the factor model become in our thinking that 
whenever we encounter a set of highly correlated measures such as cognitive tests we 
assume a common factor or factors must have produced them. It is only a step from 
this to average the scores to produce a measure of general ability — the basis of the 
most common form of categorisation in education. 


Analysis of data from Richmond Basic Skills Test 

A useful illustration of the advantages of alternative models to the common 
factor one is given by an analysis of data from the Richmond Basic Skills Test 
(Hieronymous and Lindquist, 1975; France, 1975). This test is widely used in 
educational monitoring and is intended to provide a profile of cognitive skills 
measured by subtests in 11 areas. High inter-correlation among the subtests, often 
averaging above 0:60, leads some users to question whether the test can legitimately 
be used to differentiate between the different forms of ability. It can be reasonably 
concluded that the only valid measure in the test is that of general ability which can 
be obtained by aggregating scores across the subtests. 
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For the purposes of this example, we examine data collected from 1,075 first 
year boys and girls (11-12 years-old) in a large secondary school in Birmingham, 
England, who completed eight of the subtests from the Richmond Battery: 
1. Vocabulary, 2. Reading comprehension, 3. Spelling, 4. Use of capital letters, 
5. Punctuation, 6. Grammatical usage, 7. Mathematics concepts, 8. Mathematics 
problem solving. The correlation matrix for this set of tests is shown in Table 1. In 
order to evaluate different models to account for the correlations, the LISREL V 


TABLE 1 
CORRELATIONS BETWEEN THE RICHMOND SUB-TESTS 





Vocabulary 


1. 1-00 
Comprehension 2. 0:77 1:00 
Spelling 3. 0-62 0-65 1-00 
Capital letters 4. 0°56 0-61 0-66 1-00 
Punctuation 5. 0-52 0-58 0-63 0:64 1:00 
Usage 6. 0-64 0.69 0:65 0:62 0-61 1:00 
Mathsconcepts 7. 0°61 0-62 0-58 0-55 0:54 0-53 1-00 
Maths problems 8. 0-51 0:53 0-54 0:48 0-48 0-44 0-69 1-00 


computer program (Jéreskog and Sórbom, 1981) was employed. This enables any 
identified linear model, i.e., one in which the parameters are uniquely determined, 
to be estimated and tested against observed data. The parameters of the model, e.g., 
factor loadings, are either constrained to be zero or are free to vary and are 
estimated. The pr орташ yields maximum likelihood estimates of the parameters and 
three goodness of fit measures for the fit of the model to the data: chi square (X?); 
a goodness of fit index measuring the relative amount of variance and covariance 
accounted for by the model with a range of 0 to 1 (GFI); and the root mean square 
residual derived from the residual covariance left unexplained by the model (RMS). 
It should be noted that a good fit on these various indices does not necessarily mean 
that the causal model is correct, simply that it is one possible causal model that fits 
the data; while other possible causal models are ruled out. Experimental methods 
are of course required to establish causal inference unequivocally. Moreover, 
comparison of the fit of a series of different models by, for example, comparing chi 
squared values relative to degrees of freedom can be carried out strictly only if the 
models are ‘nested’, i.e., each one differs from the previous one by having one or 
more of its constraints removed. All of the goodness of fit indices are presented for 
each of the models considered here. LISREL diagnostic information, which suggests 
places in the model where the fit is poorest, was also employed to improve the 
model. For readers unfamiliar with LISREL, Bynner and Romney (1985) provide an 
illustrative introduction to the topic. 


(a) Factor analysis models 

Exploratory factor analysis of the correlation matrix in Table 1 points to a 
strong general factor in the data: the first principal factor of the tests accounts for 
more than eight times the variance of the tests than the second principal factor, 
above the generally accepted criterion (usually six) for unidimensionality. However, 
the slight evidence of clustering among the correlations on the principal diagonal of 
the matrix among the verbal tests on the one hand and the maths tests on the other 
could justify a two-factor rotated oblique solution. Such a rotation would also make 
good sense in terms of the many theoretical conceptions of cognitive functioning 
which separate verbal ability from mathematical ability (e.g., Thurstone, 1948). 


Table 2 shows the result of fitting each of these two models — single factor 
(model 1) and two-factor (model 2) — to the test data. Each model was evaluated in 
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two forms: a highly constrained form exactly as specified by the model (a) and 
secondly a less constrained form (with residuals between some of the variables free 
to correlate as suggested by LISREL diagnostic information) to improve the fit (b). 
Parameter estimates for the model 2b are shown in Figure 3. Standard errors and ‘t’ 


TABLE 2 
GOODNESS OF Fir STATISTICS FOR FACTOR MODELS 


Model 1а X! too large to be calculated 


Model 1b 7 0-975 
Model 2a * ї 
Model 2b 





Notes: Model 1а constrains all correlation between residuals to be zero. 
Model 1b releases the constraints on three of the correlations between residuals. 
Model 2a has no overlapping between the factors and no correlations between residuals. 


Model 2b has overlap on variable 6 (usage) and releases the constraints on three of the 
correlations between residuals е,е,, ¢,¢,, e,e,. 


FIGURE 3 
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values produced in LISREL are only approximate for basic data in the form of 

correlations (for a sample of this size a statistically significant parameter estimate, 

0-01 level, will be in the order of 0-05 to 0-10). Note that the estimated correlation 

between the two factors is 0°85, i.e., besides being highly statistically significant the 

oe suggests, in factor analytic terms, barely any separation between the two 
actors. 


It can be seen from Table 2 that the most constrained forms of each of the 
models show a very poor fit to the data: chi square relative to degrees of freedom is 
very large and the other indices similarly have poor values (for a sample of this size a 
‘good fit’ will be indicated by a ratio of X? to degrees of freedom in the order of 
2: 1). The less constrained forms of the models show an improved fit, but there is 
still much covariance in the matrix left unexplained. There is also the oddity of one 
factor loading for test 6 exceeding 1. This type of result, analogous in factor analysis 
parlance to a ‘Heywood case’, is a fairly common occurrence in LISREL, and 
suggests that the model is too constrained for the data. Further relaxation of 
constraints on the basis of the LISREL diagnostic information could improve the fit 
further, but doing much more of this in the absence of strong theoretical 
justification could lay one open to the charge of crude empiricism, not to say the risk 
of capitalising on chance. 


(b) Simplex models 

An alternative way of modelling the structure of relationships among the tests is 
in terms of a simplex. As we saw earlier, this is a linear order in which each ability 
may be considered to influence the next and in which the strength of the influence is 
shown by a path coefficient (standardised partial regression coefficient). The path 
coefficients can be in effect ‘corrected for attenuation’ in LISREL by specifying and 
estimating a random measurement error component for each variable (test score) in 
the model (see Jóreskog and Sórbom, 1981, ch. III, 70). We undertake the fitting of 
the simplex model to the data in two stages, first for the verbal tests (1-6) and then 
for the maths tests (7, 8) as well. Inspection of the correlations for the verbal tests 
points to a simplex ordering with test 6 (usage) relocated between test 2 
(comprehension) and test 3 (spelling). To show the effect of the order of the tests on 
the goodness of fit of the simplex model, Table 3 gives results for the tests in their 
original order (model 1) and with test 6 relocated in the model (model 2). It is clear 
that a very good fit to the data is obtained for model 2. Parameter estimates are 
shown in Figure 4. 


TABLE 3 
[XE OF Fir STATISTICS FOR SIMPLEX MODELS (VERBAL TESTS) 





Lee 1 
Model 2 


Notes: Model 1 has variables in their original order (see Table 1). 


Model 2 has variable 6 (usage) relocated between variable 2 (comprehension) and 
variable 3 (spelling). 
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Key Е = Test corrected for attenuation, Т = test, е = residual 
measurement error variance, z = residual variance. 


(c) Extended simplex models 

We now extend the model to bring in the two mathematics tests. Results for two 
versions of the model are shown in Table 4. First the maths tests are added as an 
extension to the simplex for the verbal tests (model 1а) without any correlations 


TABLE 4 


GOODNESS OF FIT STATISTICS FOR EXTENDED SIMPLEX MODELS 
(VERBAL AND MATHS TESTS) 


o | 





o f a [| = [| ms | 
Model 1а 89-87 15 0:976 0-025 
Model 1b 30-59 14 0-991 0-012 


Notes: Model 1a has no correlations between residuals. 
Model Ib releases the constraint on the correlation between the residuals z,z,. 


between residuals. The next model includes a correlation between the residuals of 
test 7 (maths concepts) and test 6 (usage) — model 1b — as suggested by the LISREL 
diagnostic information. From Table 4, it can be seen that model 1b, the simplex with 
the correlation between residuals, provides the best fit for the data. The parameter 
estimates for the model are shown in Figure 5. 
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FIGURE 5 
EXTENDED SIMPLEX MODEL MODEL Ib 


61 e2 86 ез е4 85 67 ев 
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(а) Split simplex models 

Although the extended simplex with the maths tests included and corre- 
lated residuals produces a reasonable fit, it presents theoretical difficulties. First, 
the negative correlation between the residuals of positively correlated tests is 
difficult to interpret; it suggests that the model is distorting empirical realities. 
Secondly, the model is too restrictive in suggesting that all verbal skills precede 
mathematical skills rather than develop to a certain extent in parallel with them. 
Moreover, the relatively high correlations among test 7 (math concepts), test 1 
(vocabulary) and test 2 (reading comprehension) (Table 1) and the finding from 
LISREL (not tabulated here) of relatively high residual correlations greater than 
0:10 among these variables (i.e., correlation not accounted for by the model) 
suggests that they would be better linked in the model rather than linking test 7 
(math concepts) to test 5 (punctuation). A model which incorporates all these 
features, termed for convenience a ‘split simplex’, contains two simplexes, tests 1-6 
(verbal skills) and tests 7, 8 (math skills) originating from a single skill, test 1, 
(vocabulary). Table 5 gives the results of fitting this model to the data in two forms: 
first without any links between the two types of tests (model 1a) and secondly with 
freed correlations between the residuals of test 7 (maths concepts) with test 6 (usage) 
and test 3 (spelling); and test 8 (maths problems) with test 3 (spelling) — model 1b. 
An excellent fit is obtained for model 1b. Parameter estimates are shown in Figure 6. 
Note that this time all residual correlations are positive; and all other parameters are 
readily interpretable. 
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TABLE 5 
GOODNESS OF Fir STATISTICS FOR SPLIT SIMPLEX MODELS 


(VERBAL AND MATHS TESTS) 
Model 1а 99-62 16 0:970 0-030 
Model 1b 23:26 13 0-992 0-010 


Notes: Model 1 has no correlations between residuals. 


Model 1b releases the constraints on the correlation between the residuals z,z,, z,z, and 
2.23. 










FIGURE 6 
SPLIT SIMPLEX MODEL 1b 
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DISCUSSION 
All types of model, factor, simplex, and split simplex, are highly subject to 
sampling fluctuation, but replication on later samples of children from the same 
school suggests that they are reasonably stable, i.e., the order of magnitude of factor 
loadings and the simplex ordering of tests does not vary much between samples. It is 
therefore reasonable to consider substantive interpretations of the different models. 


The factor model implies underlying ‘‘latent traits’? generating the various 
abilities the tests measure. Nothing is implied or can be said about the way these 
various abilities might influence each other. The model draws one inevitably 
towards the conception of cognitive ability as a relatively fixed characteristic present 
from birth in the terms that such writers as Jensen (1980) describe it. 


It is interesting to contrast such an interpretation of the structure of ability with 
the one suggested by our final split complex model (Figure 6). Here we have a picture 
of dynamic change in two linear sequences in which each ability feeds the next and 
with connections between certain abilities from one sequence to the other overriding 
the linear separation. It makes good interpretive sense, for example, to see the 
correlations between the residuals of usage and spelling with those of the maths 
concepts test as signifying a special connection between these types of skill. 


Existence of a simplex ordering among tests, suggesting a causal sequence 
between the abilities they measure, says nothing of course about the direction of 
causation; the arrows between the tests in Figure 5 could just as easily be reversed. 
Support for the direction shown comes from another source: all children completed 
a verbal reasoning test before entering the school where the Richmond test data were 
collected. It is reasonable to assume that the tests with the highest correlation to 
verbal reasoning are the nearest to it and thus prior to others in the causal order. The 
highest correlations were in fact with vocabulary, comprehension and maths 
concepts (Table 6), which is in line with the causal ordering shown in Figure 6. 


TABLE 6 
CORRELATION OF RICHMOND TEST WITH VERBAL REASONING SCORE 


Verbal Reasoning 
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This order in the two sequences also makes good interpretive sense in 
educational terms. Vocabulary can be seen as lying at the core of both types of skill, 
each skill following vocabulary in the linear order depending on it to a certain 
extent. But vocabulary is itself growing continually as are all the other skills in 
response to the education process. The attraction of the model is that it suggests a 
flow of influence from improvements in one skill to the next — what every teacher 
would like to believe occurs but what theoreticians have discouraged them from 
believing on the grounds of traditional factor analytic ‘evidence’. 


The analysis of the Richmond Basic Skills Tests gives striking evidence of the 
value of alternative models accounting for correlations among test scores to the 
common factor model. The simplex is a marked improvement on the single factor 
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model for the verbal tests and the split simplex again fits the data better than the 
two-factor model for the whole set of eight tests. 


This result may be seen as encouraging to those who believe there is educational 
and theoretical merit in maintaining clear distinctions between verbal abilities and 
mathematical abilities test scores while recognising the dynamic relationship 
between them. Although the correlation between these scores may be considerably 
reduced when the common factor is partialled out, the notion that they reside 
basically in the same fixed attribute (intelligence) — the ‘strength’ model — does 
insufficient justice to the rich and complex teaching and learning process that has 
given rise to them. Teachers will benefit from a greater realisation that 
enhancements in one skill will carry over to another and will also benefit quite 
different types of skill. Facilitating, reinforcing and catalysing these processes may 
be seen to lie at the core of good teaching. 
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IMPACT OF SOCIOMETRIC METHOD AND ACTIVITY 
CONTENT ON ASSESSMENT OF INTERGROUP RELATIONS 
IN THE CLASSROOM 


By JOSEPH SCHWARZWALD, TAL LAOR, anp MICHAEL HOFFMAN 
(Department of Psychology, Bar-Ilan University, Ramat Gan, Israel) 


Summary. The study compared patterns of ethnic and sex relations revealed by 
nomination versus rating methods for activities at differing levels of intimacy. Integrated 
junior high school students in 16 classrooms filled out two versions of the Interpersonal 
Relationship Assessment Technique treating willingness to engage in activities of low, 
moderate, and high intimacy. In the peer nomination version, students identified three 
classmates most desired as partners for each activity. In the rating method, students 
stated positive desire for each classmate. As hypothesised, the nomination technique 
heightened the extent of group cleavage along sex and ethnic dimensions relative to the 
rating method. Moreover, it was insensitive to differences in sociometric content. In 
contrast, the rating method revealed, as expected, reduced social cleavage at less 
demanding levels of social contact. Discussion focused on the theoretical processes 
underlying the different methods and their practical implications for the assessment of 
intergroup relations. 


INTRODUCTION 


Tue mixed results of integration efforts in the United States (Schofield, 1978) and 
Israel (Schwarzwald and Amir, 1984) have raised important questions regarding the 
sociometric techniques used to appraise intergroup relations. To date, two major 
forms of sociometric measures have been used extensively to evaluate intra- and 
inter-group relations: limited peer nomination and the whole group rating (Hartup, 
1970; Foster and Ritchey, 1979; Miller and Gentry, 1980). In the former, subjects 
select a small number of group members for specified joint activities (e.g., play or 
work) or roles (e.g., friend), whereas, in the latter, subjects report the desirability of 
every group member. The question has arisen whether the variations between these 
two techniques have contributed to the inconsistencies in evaluation outcomes. 


Specific concerns have been expressed regarding the utility of the limited 
nomination techniques. Psychometric criticisms have been directed toward their 
reliability and stability (e.g., Holland and Leinhardt, 1973, Hallinan, 1976; 
Singleton and Asher, 1977). Moreover, substantive questions have arisen regarding 
their capacity to reveal the true extent of intergroup acceptance (Cohen, 1975; 
Schofield, 1978; Schwarzwald and Cohen, 1982). Specifically, it has been argued 
that nomination techniques are insensitive to subtle gradations in the level of inter- 
personal relations. 


Beyond their psychometric differences, it appears that the two forms of 
sociometric techniques may yield differing conclusions regarding intergroup 
preferences in the classroom. Schofield and Whitley's (1983) recent meta-analysis of 
18 inter-racial studies indicated that the degree of same-group preferences was 
strikingly more pronounced in studies using the nomination techniques. Similar 
results appeared when Schofield and Whitley directly assessed elementary school 
students’ same-race and same-sex preferences using the two techniques. Same-race 
preferences appeared more strongly under the nomination technique rather than 
with whole group rating. However, the two methods indicated equally pronounced 
same-sex preferences. 
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One possible conclusion from Schofield and Whitley’s work is that the nomina- 
tion technique — in spite of its psychometric faults — is actually more sensitive to 
group preferences than the whole group rating technique. However, perusal of the 
techniques themselves and the relations explored suggests quite the opposite 
conclusion regarding their relative sensitivity. In particular, we would argue that the 
nomination technique is insensitive to differential patterns of group preference 
arising at varying levels of intimacy. In contrast, the whole class rating technique is 
more sensitive to these differential patterns at each level. 


Implicit in this argument are two assumptions. The first deals with the focus of 
the two techniques. Peer nomination techniques are assumed to assess friendship 
preferences, regardless of the activity content addressed in the questionnaire. In 
other words, a respondent who is asked ‘‘Who are the three people you most like to 
have as ‘X’’’ will nominate the same close friends, whether X is replaced by 
“playmate”, ‘‘workmate’’, **confidante" or any other role activity. In contrast, 
rating techniques assess the range of interpersonal acceptance and are directly 
influenced by the type of activities addressed. One's willingness to have another as a 
partner varies from activity to activity as a function of the interpersonal 
commitment required. 


A second assumption deals with the underlying relationship between level of 
same-group preference and the intimacy of relationships surveyed. The social 
distance literature (Stein et al., 1965) suggests that same-group preferences become 
more pronounced with increasing intimacy. For example, subjects may not have any 
same-group preferences in regard to seating partners, yet still have very distinct 
preferences as to the guests they invite home. 


Taken together, these two assumptions lead one to expect that sociometric 
technique and intimacy will interact in determining revealed same-group preference. 
Nomination techniques will always point to high same-group preferences, since they 
inherently tap high intimacy friendships regardless of the specific activity addressed 
in the questionnaire. Yet rating techniques, with their sensitivity to relative intimacy 
levels, are more likely to reflect the graduated rise in same-group preference with the 
rise in intimacy. As a result, rating techniques will point to lower levels of same- 
group preference than the nomination technique for all but the most intimate 
activities. 


Schofield and Whitley's data gloss over the possible interaction of method and 
intimacy, since their subjects were only asked to nominate or rate peers for work and 
play roles involving low or moderate levels of intimacy. They did not ask their 
subjects to report preferences with regard to roles and activities of a more 
demanding nature. As such, it is still unclear whether the two techniques diverge in 
ше capacity to reveal same-group preference across the range of interpersonal 
relations. 


The present study extends Schofield and Whitley’s work by methodologically 
disentangling the issue of data-gathering techniques from that of the content or 
intimacy of accessed relations. It takes advantage of the unique structure of the 
recently developed Interpersonal Relationship Assessment Technique (IRAT) in 
which subjects state their willingness to share varying activities with given partners 
(Schwarzwald and Cohen, 1982). Repeated findings (Schwarzwald, Moisseiev, and 
Hoffman, in preparation) indicate that IRAT activities form a unidimensional scale 
of monotonically rising intimacy or acceptance, ranging from the casual to the 
committed. In the present study, IRAT items of low, medium, and high intimacy are 
presented to subjects in parallel nomination and whole group rating formats. 
Analysis focuses on two types of known same-group preferences: sex and ethnicity. 
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METHOD 

Subjects 

The sample consisted of students in 18 homeroom classes drawn from six 
integrated junior high schools located in the urban centre of Israel. One class was 
selected randomly at each school from those of the seventh, eighth, and ninth 
grades. Each class contained from 34 to 42 students. Overall, the sample was made 
up of approximately equal numbers of males and females (50-2 per cent and 49-8 
per cent). 


In 16 classrooms, the percentage of Western students ranged between 35 and 63 
per cent with an average of 49 per cent. The negligible percentage of Western 
students (less than 10 per cent) in the two remaining classes cast doubt on their 
integrated status and led to their exclusion. As a result, the final sample consisted of 
281 male and 237 female students. 


Measures 

The Interpersonal Relationship Assessment Technique developed by 
Schwarzwald and Cohen (1982) was employed. The scale included the following 
activities in random order. In order of increasing intimacy, these consisted of: 


(a) To lend him/her a book or pen in class. 

(b) To play with him/her during recess. 

(c) To sit next to him/her in class. 

(d) To do homework with him/her after school. 
(e) To be his/her best friend. 

(f) To reveal personal secrets to him/her. 


The relative level of intimacy for each item was determined on the basis of 
scalogram analyses performed on IRAT data collected from junior high school 
students (Schwarzwald and Cohen, 1982; Schwarzwald et al., in preparation). This 
analysis showed that IRAT items form a unidimensional, monotonically rising scale 
к high coefficients for reproducibility (0-92), scalability (0-73), and reliability 
0-91). 


The IRAT questionnaire was formulated in two versions: 


1. Rating — Respondents were asked whether they would or would not wish 
to engage in each of the IRAT activities for each target classmate appearing 
in an alphabetic roster printed on the form. 


2. Nomination — Respondents were asked to list, for each activity, the three 
classmates they most wished as partners. Respondents replied using 
classmates' list number in the class roster appearing on the form. 


Procedure 

Students responded to both versions of the IRAT during a single classroom 
session which lasted from 20 to 35 minutes. At the same time, subjects recorded their 
ethnic background on a cover sheet. Student track level in mathematics, a measure 
of academic status, was subsequently taken from school records. 


Data reduction 

Data reduction focused on the creation of a comparable group preference index 
for the whole roster and nomination techniques which was unbiased for the differing 
sizes of target groups within a given class. The resulting index Р, reflected the 
proportion of endorsements given to target group i by individual j, adjusted for the 
size of target groups in j's classroom. This was calculated by the following formula: 
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N,/1, 

P, = —— x 100 
E (МЛ) 
ij 


In this formula, N,, represents the number of members in target group i endorsed by 
j, and І, equals the number of target group i members in j’s classroom. Multiplica- 
tion by 100 produced centile scores. 


In order to address activity content, separate indices were calculated for 
endorsements at three levels of intimacy. The six IRAT items were divided into two- 
item sets of low, medium and high intimacy in accord with the scale positions noted 
above. Separate group preference indices were calculated for endorsements in each 
two-item set. 


One important characteristic of these indices must be noted. The preference 
indices for the different target groups are not independent, since their sum is always 
equal to 100 per cent. As a result, analyses related to sex or ethnic preferences were 
performed on the index score for only one of the two related target groups, since the 
results for the second group would only be complementary. 


RESULTS 


Separate three-way ANOVAs were conducted to address the issue of same- 
group preference along sex or ethnic characteristics. In these, intimacy of content 
and sociometric method were treated as repeated factors, and respondents’ group 
characteristic as a between-subjects variable. 


Cross- and intra-sex relations 

The three-way ANOVA for male target preference indicated that all main 
effects and interactions were significant. These effects can be addressed most 
succinctly by focusing on the highly significant triple order interaction of sex by 
method by intimacy ((F(2, 924) = 214-74, P < 0-0001)). Table 1 presents the group 
preference means for male targets who were involved in this analysis. The 
complementary preference scores for female targets are included for purposes of 
contrast. 


TABLE 1 


MEAN SEX PREFERENCE, BY SEX OF TARGET AND RESPONDENT, 
DATA COLLECTION METHOD, AND LEVEL OF ÍNTIMACY 











Rating Nomination 

Sex of Sex of Intimacy Intimacy 
Respondents Targets Low Moderate High Low Moderate High 
Male Male 58-53 70-88 76:99 79-50 75:37 75-43 
Female 41-47 29-12 23.01 20.50 24:63 24:57 
Female Male 39-06 26-08 14-78 12:48 13:85 13-67 
Female 60-94 73-92 85:22 87-52 86-15 86-33 





Inspection of the table reveals strong same-sex preferences among both boys 
and girls for both the rating and nomination techniques. Close perusal indicates a 
dramatically differing effect of intimacy level on target group preference in the 
rating and nomination techniques. With the rating technique, same-sex preference 
increases linearly in magnitude with rising intimacy for both males ((F(1, 924) — 
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22-72, P « 0-01)) and females ((F(1, 924) = 29-29, P « 0-01)). By contrast, with 
the nomination technique, same-group preference remains at a similarly high level 
across differing degrees of intimacy. Moreover, only in the case of high intimacy 
activities do same-group preferences found with the rating technique attain a 
parallel level to that yielded by the nomination technique. 


Cross- and intra-ethnic relations 

The 2 (respondent origin) x 2 (method) x 3 (intimacy level) ANOVA of 
preference for Middle Eastern targets revealed all main effects to be significant, as 
well as the two-way interaction between respondent origin and method and the triple 
interaction. Analysis of this highly significant origin by method by intimacy 
interaction ((F(2, 904) — 5-85, P « 0-003)) sheds light on the broader pattern of 
effects. The preference means for Middle Eastern targets are given in Table 2, along 
with the complementary means for Western targets. 


TABLE 2 


MEAN ETHNIC PREFERENCE, BY RESPONDENT AND TARGET ORIGIN, 
DATA COLLECTION METHOD, AND LEVEL OF INTIMACY 











Rating Nomination 
Respondents' Targets' 

Ethnic Ethnic Intimacy Intimacy 
Ongin Origin Low Moderate High Low Moderate High 
Western Western 53:64 56-11 60-87 65-08 65-84 66-27 
Middle Eastern 46:36 43-89 39-13 34:92 34°16 33:73 

Middle 

Eastern Western 48-42 49-25 50-26 45:1] 45-97 48:41 
Middle Eastern 51:58 50-75 49:74 54:89 54-03 51-58 


Strikingly different preference patterns appear among Western and Middle 
Eastern respondents. Westerners display a distinctive preference for same-group 
members, whereas their Middle Eastern counterparts do not. These ethnic 
differences reappear in regard to the interaction of sociometric method and level of 
intimacy. Among Middle Easterners, neither method nor intimacy significantly 
alters the basic pattern of limited ethnic cleavage. Yet among Westerners, same- 
group preference varied markedly with method and level of intimacy. In the 
nomination technique, Westerners display a high level of same-group preference, 
regardless of the specific activity assessed. In contrast, trend analysis of rating 
method data indicates that Westerners’ same-group preference became accentuated 
with increasing intimacy ((Е(1, 904) = 9-18, P < 0-001)). The degree of same-group 
preference in the rating method only rose to the levels indicated by the nomination 
technique for the high intimacy activities. 


DISCUSSION 


The study findings are in concordance with the contention that the assessed 
degree of social cleavage is contingent upon the sociometric technique employed and 
the types of activities surveyed. Wherever sex or ethnic preferences appeared, the 
apparent level of cleavage was greater using the restricted nomination technique 
than when whole group rating was employed. Where underlying preference was 
limited or absent, as in the case of Middle Easterners’ ethnic choices, both 
techniques evoked similar outcomes. 


As expected, an interaction appeared between sociometric technique and 
activity content for both sex and ethnicity analyses. On the one hand, the degree of 
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same-group preference revealed by the whole group rating technique was linearly 
related to the degree of item intimacy or social commitment: with greater intimacy, 
same-group preference was more pronounced. On the other hand, the nomination 
technique was insensitive to the differing gradations of item intimacy, pointing to 
the same high levels of cleavage across activities. Only at the highest levels of 
intimacy did results obtained with the two techniques converge. 


These findings support the theoretical contentions that although same-group 
preferences increase with intimacy, the two techniques differ in their capacity to 
detect this tendency. The nomination technique appears to tap preferences reflecting 
more intimate friendships, regardless of the specific type of activity actually being 
surveyed. The whole group rating technique, by distinction, appears to be a more 
accurate mirror of differential interpersonal acceptance. 


Comparison of these results with those reported earlier by Schofield and 
Whitley (1983) suggests that the differences between the two methods is contingent 
on the extent of underlying intergroup cleavage. In Schofield and Whitley’s work, 
the two sociometric methods indicated convergent high levels of sex cleavage, even 
on the moderate intimacy items they employed. Yet in the present study, the 
nomination technique yielded higher sex cleavage results than the rating technique 
for all but the high intimacy activities. These differences appear to reflect the 
observation that elementary school children display greater same-sex preference 
across a wider range of activities than junior high school students (e.g., Waller, 
1967). In short, where underlying cleavage is extremely high both techniques offer 
equal assessments. 


The present study findings regarding Western and middle Eastern ethnic 
preferences extend this conclusion to the case where underlying cleavage is very low 
or absent. Recent investigations (Schwarzwald and Amir, 1984; Schwarzwald, Fridel 
and Hoffman, in press) indicate that Middle Easterners, unlike their Western 
counterparts, do not display same-group preferences. This lack of cleavage appears 
to be the cause for the converging results between the differing sociometric methods 
across the range of intimacy. 


It is interesting to speculate as to why the two techniques are differentially 
responsive to activity content. One surface explanation may be found in the wording 
of the nomination technique. For example, requesting a respondent to select the 
most desirable partner focuses attention more on the issue of ‘‘desirable’’ partners 
rather than on the specific activity at hand. 


An alternate path of explanation is rooted in possible distinctions between the 
underlying judgmental processes evoked by the two methods. In the rating 
technique, the respondent attends to each individual separately and thus can 
concentrate on each activity under consideration. By contrast, the nomination 
technique involves interpersonal comparison, since it forces some partners to be 
included or excluded at the cost of others. As a result, attention is drawn away from 
activity considerations, while dimensions of individual difference (e.g., sex, race, 
status) or desirability are made more salient. As such, arguments by Kelly (1963) 
regarding the centrality of the friend/enemy dimension among individuals’ personal 
constructs may explain the particular affinity for friendship cliques in the limited 
nomination technique. 


Although preliminary in nature, these analyses point to the need for greater 
research into the specific quantities these tools measure, as well as to the underlying 
reasons for these distinctions. Yet, the study findings in the interim would appear to 
have important implications for the selection or construction of appropriate 
assessment tools as well as the interpretation of previous research results. Clearly, 
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the rating technique appears to be the method of choice whenever one is interested in 
interpersonal relations at various levels of intimacy. For example, studies tracing age 
or intervention related developments in intergroup relations frequently assume that 
change occurs at low levels of intimacy before it arises at higher levels of social 
commitment. The rating method, due to its greater sensitivity, is the only of the two 
measures capable of detecting these changes at the lower levels while discriminating 
the absence of change at higher levels. This use, of course, requires the inclusion of a 
suitable range of activities. By distinction, the nomination technique appears to bea 
viable alternative only when one is interested in close friendships or similar levels of 
social commitment. Here, the researcher can include almost any specific activity 
contents, since similar results will be obtained in any case. Yet the psychometric 
limitations of the nomination technique, noted in the introduction, cast doubt on 
even this limited possibility. 


One central study implication is that caution must be employed in interpreting 
past findings. The limited nomination technique, by its emphasis on intimate bonds, 
tends to underestimate interpersonal acceptance at lower levels of commitment more 
than the rating method. Thus, in research comparing intergroup relations over time 
or treatment, greater credence should be given to reports of differences obtained 
with the nomination technique as opposed to the rating technique. In contrast, 
greater weight should be given to findings of no significant change when the rating 
method rather than the nomination technique is employed. 


In a similar fashion, the whole group rating technique’s dependence on item 
content also requires interpretive caution. Rating tools which employ high intimacy 
items underestimate interpersonal acceptance at the lower range of involvement, just 
as rating tools which employ low intimacy items tend to overestimate acceptance at 
the higher levels. As such, greater credence should be attached both to significant 
differences obtained from rating tools using higher intimacy items as well as to the 
absence of differences with rating tools addressing more casual or less demanding 
items. 
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THE STABILITY AND CLASSIFICATION OF SPECIFIC 
READING RETARDATION: A LONGITUDINAL STUDY 
FROM AGE 7 TO 11 


By DAVID L. SHARE Амр PHIL A. SILVA 
(Dunedin Multidisciplinary Health and Development Research Unit, 
University of Otago, New Zealand) 


Summary. The concept of specific reading retardation assumes that there exists a stable 
group of under-achieving children whose classification is not simply age- or test-specific. 
This assumption was investigated longitudinally in a large sample of New Zealand 
children who were followed from age 7 to age 11. Substantial overlap was found between 
groups classified as specific reading retarded at ages 7, 9, and 11 on the basis of Burt 
reading and WISC-R Performance IQ scores. At age 11, there was also substantial 
agreement between classifications based on three separate measures of reading 
achievement, although there was evidence of test-specific factors unique to compre- 
hension- versus word recognition-based classifications. Agreement between classifica- 
tions at age 11 based on WISC-R Performance IQ and Fullscale IQ was high. The data 
indicated the existence of a fairly stable group of children characterised by enduring 
under-achievement in reading throughout the primary years of schooling. 


INTRODUCTION 

ResEARCH into specific reading disabilities has been plagued by difficulties in opera- 
tionalising under-achievement. Traditional indices, such as difference scores and 
achievement quotients, have been shown to be either unreliable (owing to regression 
effects), age-biased or IQ-biased (Phillips, 1961; Thorndike, 1963; Wilson et al., 
1983; Yule, 1984). Thorndike (1963) and Yule and his associates (Yule, 1973, 1984; 
Yule ef al., 1974; Rutter and Yule, 1975) have advocated a regression-based 
definition of under-achievement which overcomes these flaws. In this approach, 
under-achievement is based on the discrepancy between a child's actual reading 
score and a predicted reading score. The predicted reading score is derived by means 
of a multiple regression equation from a child's age and IQ. Research into the 
characteristics of groups of children classified by this method as specific reading 
retarded has employed a variety of tests and age groups (cf Rutter and Yule, 1975; 
Jorm et al., in press; Silva et al., 1985). Implicit in this method of classifying 
children is the assumption that there exists a reasonably stable group of specific 
reading retarded children whose classification is not merely age- or test-specific. 


Only one longitudinal study appears to have reported data on the question of 
stability of classification over time. Jorm ef al. (in press) reported that although test- 
retest correlations between end of Grade 1 (age 7) and end of Grade 2 (age 8) were 
high (Neale Accuracy 0:88, Neale Comprehension 0-77) stability of classification 
was low. Of 24 children classified as specific reading retarded at Grade 2 using a cut- 
off of 1-5 standard errors, only six had been classified as specific reading retarded at 
Grade 1. Jorm ef al. cautioned that because classification was based on an arbitrary 
cut-off along a continuum, small changes in actual scores may have moved children 
across category boundaries. Data reported by Share et al. (1986) on the effects of 
positive skew on extreme residual scores suggest that the small number of children 
(N = 12) classified as specific reading retarded in the Jorm et al. study was probably 
due to the floor in reading scores at Grade 1. There is now strong evidence (Share et 
al., 1986; Van der Wissel and Zegers, 1985) that the number of children classified as 
specific reading retarded according to a standard error-based cut-off is simply a 
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function of the skew in reading scores. For this reason, the Jorm ef al. data do not 
adequately address the issue of stability. Share et al. have argued that this problem 
can be avoided by simply setting a cut-off (e.g., bottom 5 per cent) based on a 
ranking of residual reading scores — actual reading score minus predicted score. 


As yet, there appears to be no evidence on whether classification is test-specific. 
There are at least two lines of argument that suggest that the regression method may, 
in part, be test-specific. In the work of Yule and his associates (Yule, 1973; Yule et 
al., 1974), reading comprehension and word recognition measures have been used 
interchangeably as measures of a common construct. Jorm et al. (in press) used a 
composite measure based on both word recognition, reading comprehension, and 
reading rate. Despite the typically high correlations between word recognition and 
reading comprehension, there may be subgroups of children, such as specific 
comprehension disabled (Cromer, 1970) or hyperlexic (Healy, 1982), with a 
dissociation between word recognition and reading comprehension. As yet, there 
appears to be no evidence on whether comprehension and word recognition based 
measures both yield a common group of specific reading retarded children. 


A second point relevant to the issue of test-specificity of classification is the 
nature of the IQ measures employed. Although most definitions of reading 
disability are based on tests with a substantial verbal component (e.g., WISC-R), 
some authors (e.g., Stanley eż al., 1982) have argued that only non-verbal measures 
are appropriate since these children typically show verbal deficits. As yet, there 
appears to be no published evidence on the consistency of classification based on 
verbal versus non-verbal IQ scores. 


In addition to theoretical considerations, the question of stability of 
classification over tests and time has considerable practical relevance to the issue of 
the early identification of disabled readers. Remedial] reading instruction usually 
commences around age 9 (see, e.g., Shearer, 1967; Carroll, 1972; Yule, 1976) owing 
largely to the popular assumption that identification of children at an earlier age is 
unreliable. Advocates of early identification have argued that reading achievement 
as early as 7 years of age correlates very highly with later reading achievement 
(Butler ef al., 1982; Williams and Silva, 1985). But as the Jorm ef al. findings 
indicate, a high correlation may not necessarily imply stable classification. It is 
necessary to show that children identified at an early age are, by and large, the same 
group who would normally receive remedial services at a later age. 


This study examined the question of stability of classification across three ages 
— 7, 9, and 11, and (at age 11) across three reading tests in a large sample of 
Dunedin children. 


METHOD 

Sample 

The sample consisted of those 11-year-old children enrolled in the Dunedin 
Multidisciplinary Health and Development Study. The setting, selection, and 
characteristics of the Dunedin sample have been described in detail by McGee and 
Silva (1982). Briefly, these children were part of a cohort born between Ist April, 
1972 and 31st March, 1973 at Queen Mary Hospital. The children were first traced 
at age 3 (1975) and a total of 1,139 lived in the Dunedin metropolitan area or Otago 
province and were thus eligible for inclusion in the study. Of the 1,139 children, 
1,037 were assessed within one month of their third birthdays (1975-1976). The 
remaining chHdren were not assessed because of parental refusal or because they 
were traced too late for inclusion at this age. Assessments have occurred at two- 
yearly intervals with 991 assessed at age 5 (1977-1978), 954 at age 7 (1979-1980), 955 
at age 9 (1981-1982), and 925 at age 11 (1983-1984). 
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When compared to all New Zealand Children, the Dunedin sample is slightly 
biased with more children being represented in the upper socio-economic status 
(SES) levels and fewer in the lower SES levels according to the Elley and Irving 
(1972) index. In addition, the sample is mainly of European origin with only 2 per 
cent of Maori and Polynesian background compared with about 10 per cent for the 
country as a whole (Department of Statistics, 1976). 


Identification of reading retarded children 

Word recognition was assessed at ages 7, 9, and 11 with the Burt Word Reading 
Test (Scottish Council for Research in Education, 1976). At these same ages, 
intelligence was assessed using the Wechsler Intelligence Scale for Children — 
Revised, or WISC-R (Wechsler, 1974). Reading and IQ scores at all three ages were 
available for a total of 917 children. At age 11, scores on the PAT (Progressive 
Achievement Tests) Reading Comprehension Test (Elley and Reid, 1969) and the 
PAT Reading Vocabulary Test were also available for 616 of these children. These 
data were available from a broader study of educational attainment of 2,600 Form 1 
and П children from Dunedin’s six intermediate schools (Silva, 1984). (Silva found 
no significant differences between children enrolled in the Health and Development 
Study and those not enrolled.) PAT reading scores were recorded as age-based 
percentile ranks using norms from the test manuals. These distributions were 
therefore rectangular. However, since the regression procedures used to define 
specific reading retardation do not require either the reading or IQ scores to be 
normally distributed (Cohen and Cohen, 1975), the use of percentile ranks is 
permissible. 


Age was not required as an additional independent variable in these analyses 
since all children were tested at, or close to, their birthdays on the Burt and IQ 
measures, while PAT scores were already age-adjusted. 


For the reasons outlined above, a 5 per cent cut-off on the distribution of 
residual reading scores was used to select the group of extreme under-achievers 
rather than the cut-off based on the standard error of prediction. This method had 
the advantage of defining groups of equal size across ages and tests. 


RESULTS 


Stability of classification over ages 7, 9, and 11 

At each of these three ages, Burt reading scores were regressed on performance 
IQ. Inspection of the scatterplots of residuals indicated that the assumptions of 
linearity and homoscedasticity were satisfied. 


The correlations between Burt reading scores and Performance IQs were 
0:40, 0:41, and 0:42 at ages 7, 9, and 11 respectively. The regression equations 
were: Predicted Reading (age 7) = 0-37xIQ — 9-6, Predicted Reading (age 9) 
=0:52x IQ — 1-0, and Predicted Reading (age 11) = 0:54xIQ + 12:8. 


At each age the specific reading retardation group was designated as those 
children whose residual reading scores fell in the bottom 5 per cent of the 
distribution. This yielded a group of 44 children at each age. If classification is age- 
specific, there should be little overlap between these three groups. If, on the other 
hand, specific retardation in reading represents an enduring characteristic of a 
subgroup of children, substantial overlap would be expected. Figure 1 depicts the 
overlap in groups across the three ages. 


Of the 44 children classified as reading retarded at age 7, 25 (57 per cent) were 


reclassified as reading retarded either two or four years later, or both. Only 19 
children were found classified as reading retarded at age 7 alone. Seventy-five per 
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FIGURE 1 
OVERLAP BETWEEN GROUPS CLASSIFIED AT AGES 7, 9 AND 11 AS SPECIFIC READING RETARDED 
Age 8 (n-44) 


Age 7 (n=44) Voy Age 11 (n=44) 


cent (33/44) of the reading retarded group at age 9 were classified as reading 
retarded either two years earlier, two years later, or both. At age 11, 28 out of 44 
children (63 per cent) had been previously classified as reading retarded. At all three 
ages examined, a majority of children classified as reading retarded were reclassified 
as reading retarded at another age. These data represent substantial overlap across 
ages and consequently fail to support the suggestion that the classification method 
advocated by Thorndike and Yule is age-specific. Rather, it identifies a fairly stable 
group of children characterised by enduring under-achievement in reading across the 
primary years of schooling. These data also provide a clear practical demonstration 
of effective elimination of the perennial regression-to-the-mean problem and the 
consequent difficulty in reliably identifying extreme groups. 


To determine whether there were any differences between children classified 
once, twice, or three times as reading retarded, the reading, IQ and spelling scores of 
these groups were compared. Spelling was assessed at ages 9 and 11 using the short 
form of the Dunedin Spelling Test (Silva et a/., 1984). This test comprised 25 words 
graded in difficulty. 


TABLE 1 


READING, SPELLING, AND IQ MEANS FOR TRANSIENT, MODERATELY 
STABLE, AND VERY STABLE READING RETARDED GROUPS 








Moderately Very 

Transient Stable Stable 
Variable (N=46) (N=22) (N= 14) 
Burt Reading, Age 7! 14.8 12.1 7.5 
Burt Reading, Age 9? 32.7 27.6 21.9 
Burt Reading, Age 11! 48.5 45.1 33.9 
Spelling, Age 9? 3.6 2.1 1.0 
Spelling, Age 11 9.0 7.5 4.9 
Verbal IQ, Age 7 107.8 112.8 113.0 
Performance IQ, Age 7 103.1 106.0 104.6 
Verbal IQ, Age 9 94.2 96.3 93.4 
Performance IQ, Age 9 103.8 109.5 106.1 
Verbal IQ, Age 11 95.1 98.3 95.1 
Performance IQ, Age 11 113.3 113.7 113.4 
Residual Reading, Age 7! — 15.1 —19.6 —24.3 
Residual Reading, Age 9? —203 —28.4 — 32.4 
Residual Reading, Age 11? —25.0 ~ 28.6 —39.7 





х Transient group and moderately stable differ significantly from very stable 
group. 

? Transient group differ significantly from moderately stable and stable 
groups. 
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Forty-six children were classified as reading retarded at one age only 
(‘‘transient’’), 22 at two ages (‘‘moderately stable") and 14 at all three ages 
("stable"). Means for these three groups on reading, WISC-R Verbal and 
Performance IQs, spelling, and residual reading scores appear in Table 1. One-way 
analyses of variance were carried out on each of these measures. Overall tests were 
followed up with Scheffe post hoc tests. The Bonferroni inequality (Grove and 
Andreasen, 1982) was used to maintain the family-wise error rate at 0-05. The five 
achievement measures were designated as one family. Each ANOVA was tested with 
P=0:01 (0-05/5). The six IQ measures designated as a second family were each 
tested with Р = 0-0083 (0-05/6), while each measure in the third family — residual 
reading scores — was tested with P=0-0167 (0:05/3). Means for the three groups 
appear in Table 1. 

There were no significant IQ differences between any of the groups. The very 
stable group was found to have significantly lower means on Burt reading at ages 7 
and 11. Both moderately stable and very stable groups were found to have lower 
means than the transient group on Burt reading at age 9 and spelling at age 9. This 
pattern of results was also reflected in the residual reading scores which indicated 
more severe under-achievement associated with greater stability of classification. 


Stability of classification across tests 

Data available at age 11 on Burt reading, PAT Comprehension, and PAT 
Vocabulary for 616 children were used to examine the stability of classification across 
tests. Correlations with performance IQs were 0-42, 0-44, and 0-47 for Burt, 
PAT Comprehension, and PAT Vocabulary respectively. Regression equations 
were; Predicted Burt = 0:54 x IQ + 12.9, Predicted PAT Comprehension = 0-81 
х IQ — 38-4, Predicted PAT Vocabulary = 0-85 x IQ — 43-8. A 5 percent cut-off 
at the extreme lower end of the distribution of residuals produced three groups, each 
with 31 children. The overlap between these groups is depicted in Figure 2. 


FIGURE 2 


OVERLAP BETWEEN GROUPS CLASSIFIED AS READING RETARDED ON THREE MEASURES OF 
READING ACHIEVEMENT AT AGE 11 


PAT Comprehension (п=31) 


Burt Reading (n=31) coe PAT Vocabulary (n=31) 


A substantial number of children classified as specific reading retarded on any 
one measure were also classified as specific reading retarded on a second measure or 
on both the other measures. Figure 2 also shows that the overlap was greater 
between the Burt and PAT Vocabulary classifications than between PAT 
Comprehension and the two word recognition measures. It appears that although 
there is substantial agreement in classification across tests, there appear to be some 
test-specific factors unique to comprehension- versus word recognition-based 
classifications. 
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A final set of analyses dealt with the overlap between classifications based on 
performance IQ versus Fullscale IQ. The three reading measures available at age 11 
were used for these analyses. Correlations between Fullscale IQ and Burt, PAT 
Comprehension, and PAT Vocabulary scores were 0-58, 0:63, and 0-69 
respectively. Regression equations were: Predicted Burt = 0-76 x FIQ ~ 9:9, 
Predicted PAT Comprehension = 1:2 x FIQ — 79-2, Predicted PAT Vocabulary 
= 1-3 x FIQ — 89-3. Again, the lowest 5 per cent (N=31) of residual scores was 
used to define reading retarded groups. The overlap between groups defined on the 
basis of either Performance or Fullscale IQs is depicted in Figure 3. 


FIGURE 3 
OVERLAP BETWEEN GROUPS CLASSIFIED AS SPECIFIC READING RETARDED ON THE BASIS OF EITHER 
PERFORMANCE (РІО) OR FULLSCALE (FIQ) WISC-R 105 


PIQ FSIQ 


Burt Reading 


PAT Comprehension 


PAT Vocabulary 


It is clear from Figure 3 that agreement between the two classifications is high 
with the majority of children being classified as reading retarded by both IQ tests. 


DISCUSSION 

On the whole, there was substantial stability of classification across the various 
reading and IQ tests used in this study. In all cases, a majority of children classified 
as specific reading retarded on one measure were reclassified as specific reading 
retarded on at least one other measure. There was less agreement, however, between 
classifications based on reading comprehension and word recognition than between 
the two measures of word recognition. It seems likely that test-specific factors may 
partly account for these differences. 


Substantial stability of classification was also evident across ages 7, 9, and 11, 
when the same IQ and reading measures were used at all three ages. Once again, a 
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majority of children classified .as specific reading retarded at one age were 
reclassified as specific reading retarded at another age. Children classified at all 
three ages were found to be more severely under-achieving in reading than children 
classified at one age only. This, of course, is hardly surprising. An important 
question, however, is why some children shift out of the reading retarded category 
while others remain. Part of the reason is, no doubt, due to imprecision of 
measurement. The contribution of factors such as family background, behavioural 
and cognitive variables, remains to be investigated. 


These data indicate the existence of a stable group of children characterised by 
severe under-achievement in reading throughout the primary years of schooling. 
Furthermore, the reading problem in this group is evident at least as early as age 7. 
Consequently, there appears to be little reason to delay intervention/remediation 
beyond this age. Research into remediation of reading difficulties also appears to 
support this view. Findings on the effectiveness of remedial reading instruction, 
which is normally instituted around age 9, have been uniformly negative in terms of 
the long term gains relative to controls (Shearer, 1967; Carroll, 1972; Yule, 1976). 
On the other hand, at least two early intervention studies with younger children 
(Arnold er al., 1977; Bradley and Bryant, 1983) have obtained long-term gains that 
were both statistically and educationally significant. 
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PALLIATIVE VS DIRECT ACTION STRESS-REDUCTION 
PROCEDURES AS TREATMENTS FOR READING 
DISABILITY 


By CHRISTOPHER F. SHARPLEY 
(Faculty of Education, Monash University, Australia) 


AND STEVEN E. ROWLAND 
(The Scot’s School, Bathurst, NSW, Australia) 


Summary. Several previous reports of the treatment of learning disabilities via EMG 
biofeedback and relaxation training have suggested that these can be effective in reducing 
the high levels of stress which accompany the academic performance of learning disabled 
children, resulting in improvements on dependent variables such as reading, spelling, 
arithmetic and handwriting. The present study examined the relative efficacy of EMG 
biofeedback, relaxation training, remedial teaching and two control procedures with 50 
elementary schoolchildren who had been referred for reading disability. Daily reading 
“probes” of reading accuracy, speed and comprehension were collected as well as pre- vs 
post-test data on two standardised tests of reading skill. Both time-series analyses and 
analysis of variance statistics failed to show significant improvements in accuracy and 
speed of reading for any but the remedial teaching treatment group. It is concluded that 
*'direct action” to alleviate stress due to threat of failure at reading is more effective than 
‘palliative’ attempts at reduction of tension with no action upon the performance deficit 
itself. Implications for teaching and policy for teacher training and provision of services 
are discussed, with suggestions made for further research. 


INTRODUCTION 


Hio levels of stress have been shown to exert harmful influences upon a range of 
human activities (e.g., Combs and Taylor, 1952; Cumming and Croft, 1974). 
Welford (1974) listed five elements in the relationships between stress and 
performance, noting that ‘‘the consequences of failure to meet demand must be 
regarded as important by the person concerned” (р. 1). Reactions to such failure 
may take physiological (e.g., increased heart rate, blood pressure, respiration and 
muscular tension) or psychological form (e.g., increased anxiety and fear of future 
incidence of the same stimulus), depending upon a number of factors such as 
previous experience, level of arousal and degree of threat induced by the failure. 


Concerning intellectual/academic tasks, Combs and Taylor (1952) demon- 
strated that even a mild level of threat-induced anxiety can impair performance. 
With particular reference to formal learning situations, Thompson et a/. (1980) used 
electromyographic (EMG) biofeedback to reduce muscle tension levels of college 
students and found this to be associated with higher grades than were control 
procedures. Following Sheer (1977), who showed that children who are classified as 
‘learning disabled’? demonstrated high levels of tension accompanied by an 
inability to focus upon learning tasks, a series of studies by Carter and associates 
have linked lowered muscle tension via EMG biofeedback and relaxation training to 
improvements in handwriting, reading, spelling and arithmetic (Carter et al., 1978; 
Carter, 1979; Carter ef al., 1979; Carter and Russell, 1980). While the use of 
biofeedback and relaxation training procedures to help individuals learn control of 
physiologically-based anxiety symptoms has been well documented (e.g., Peper et 
al., 1979; Yates, 1980), and EMG biofeedback is among the most reliable and 
precise of these procedures, the exact causal relationship between muscle tension 
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and anxiety as a psychological state is not universally acknowledged. Cox (1978) 
suggested that stress-coping procedures may be divided into ‘‘direct action’’ or 
**palliation". Direct action may take the form of preparation against harm or 
threat, while palliation can be either symptom-directed (e.g., tranquillisers, EMG 
biofeedback) or intrapsychic (e.g., denial, displacement). While these intrapsychic 
methods are not recommended, the i issue of direct action vs palliation has particular 
relevance for treatment of learning disabled children. For example, if direct action 
such as remedial tuition is less (or even no more) effective than relaxation via general 
relaxation procedures or specific muscle relaxation with biofeedback, then the large 
amounts of temporal, financial and human resources used in training specialist 
teachers and providing remedial tuition may be unnecessary. This issue requires 
resolution at both policy and individual child treatment levels. 


Although defined variously by a range of authors, the term ‘‘learning 
disability" is generally recognised to include the following: a significant deficit in 
learning processes such that special educational procedures are necessary for 
remediation; a discrepancy between actual and expected achievements in one or 
more academic tasks; the disability is not a result of sensory, motor, intellectual, 
social or historical handicaps (Johnson and Morasky, 1977). It should be added 
that, as noted by Hallahan and Kauffman (1978, р. 126), ‘іп most cases the cause of 
a child's learning disability remains a mystery”. Bryan (1974) indicated that the 
learning disabled child exhibits considerable distractibility which is not conducive to 
learning and this distractibility could be a reaction to perceived threat of failure 
from instructional situations. If it is the threat of future failure in typical learning 
tasks which leads to this distractibility and further poor performance, then the 
reduction of this threat ought to be associated with improvements in performance. 
The present study examined the relative effectiveness of two types of coping 
procedures designed to reduce threat: (1) direct action upon reading performance, 
and (2) palliation of the symptoms (presumably) arising from previous poor 
performance and resultant high anxiety. 


Direct action in the present study took the form of remedial teaching of specific 
individual performance deficits. Palliation was administered by EMG biofeedback 
for specified muscle relaxation or the use of general relaxation training procedures 
designed to decrease overall anxiety and tension. It was hypothesised that: (1) both 
direct action and palliation would be effective in improving reading performance 
with reading disabled children during treatment, and (2) both direct action and 
palliation would be significantly superior to control procedures on a pre- vs post-test 
comparison. Because two sets of treatment-control comparisons were hypothesised, 
it was necessary to include control groups which were measured (1) during and (2) 
pre- post-treatment. Two control groups were therefore designated. One received no 
treatment other than collection of daily data as for the direct action and two 
palliative treatments, and the second control group received no treatment other than 
collection of pre- and post-test data. These two control groups are described below. 


METHOD 

Subiects 

Fifty grade 4 and 5 children (31 boys) ranging from 9:0 to 11:9 years 
(X = 9-10) who had been referred for reading disability and fulfilled the other two 
requirements under the definition of reading disability given below were subjects for 
the study. All of these children were from five small primary schools in four country 
towns (max. population — 8,000) in northern New South Wales, Australia. Subjects 
were allocated to one of five treatment conditions across all schools, as shown in 
Table 1. All data were collected and treatments administered by a postgraduate 
student in special education. 
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TABLE 1 
DISTRIBUTION OF SUBJECTS BY SCHOOL AND TREATMENT AT COMMENCEMENT OF STUDY 











Daily 
EMG Bio- Relaxation Remedial measures 
School feedback training teaching only Control 
1 10 
2 4 10 
3 4 
4 10 2 
5 10 
Total 
Subjects 10 10 10 10 10 


Reading disability 

This was defined according to three criteria. First, from Kaluger and Kolson’s 
(1969) definition of secondary reading disability (i.e., those children whose learning 
faculties are intact but who have failed to acquire the necessary reading skills). 
Second, all of these children had been referred by their teachers as ‘‘experiencing 
considerable difficulty in the learning and performance of reading skills’’. Third, 
having a reading age in excess of 1-5 years below chronological age on both the 
GAP Test of Reading comprehension (McLeod, 1977), and St. Lucia Graded Word 
Reading Test (Andrews, 1977). Finally, the performance aspect of ‘‘reading 
disability" was defined using the International Reading Association criteria in 
relation to oral reading performance (Bush and Waugh, 1976). A child was defined 
as reading-disabled if reading at ''frustration" level for word recognition and 
comprehension (less than 90 per cent accuracy) at the relevant grade level. The use of 
these criteria ensured that these children were part of the wider group of children 
classified as reading-disabled, as well as providing precise measures of performance 
for the dependent variables. 


Dependent variables 

All four of the studies quoted above which have used EMG biofeedback and 
relaxation training for the treatment of learning disability treated children 
experiencing reading difficulties. Because this enabled more exact replication of 
these previous studies as well as measuring change in one of the most fundamental 
academic areas of elementary school, reading was chosen as the dependent variable 
in this study. Performance on reading during the study (Hypothesis 1) was measured 
by daily reading ‘‘probes’’, and data were collected on ‘‘rate’’ (words per minute), 
**accuracy"' (per cent errors), and comprehension (per cent recall correct). A reading 
“probe” is a short (about 200 words) piece of prose which is assessed for 
readability, presented on a card to the reader and for which several comprehension 
questions (usually 10) have been devised. The tester asks the reader to read the 
prose, noting errors onto a copy of the prose and timing the reader over the first 100 
words. Thus ‘‘rate’’, “ассигасу’’ and comprehension are assessed. While this 
represents a variation in procedure from those studies referred to earlier (which used 
pre- vs post-testing with the Wide Range Achievement test), this method of 
measurement has a number of strengths over simple pre- vs post-test standardised 
test data. First, the reliability of the measure is dramatically increased by utilising 
repeated measures, especially on a daily basis. Secondly, the use of difference scores 
by previous studies is of dubious worth, and results which indicate overall change 
can be statistical artefacts. Finally, while the previous studies can be criticised 
because of lack of control procedures, the multiple time-series design used herein 
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overcomes this lack, but demands the use of a measure which can be used often 
without excessive practice effects. A total of 10 separate reading ‘‘probes’’ were 
compiled, with difficulty held constant by use of the Fry Readability formula (Fry, 
1963). These 10 probes were used in turn, covering a two-week period before any 
repetition occurred. As will be seen later, there was no evidence of practice effects 
upon the second and following use of each probe. Performance on reading before 
and after the study (Hypothesis 2) was measured by the GAP and St. Lucia reading 
tests. As well as enabling Hypothesis 2 to be tested, this measure allows comparison 
with previous data from Carter and associates. 


Procedure 

Data were collected within a time-series design which allowed each subject and 
each treatment group to act as its own control, plus pre- and post-tests on the GAP 
and St. Lucia tests. Keppel (1973) and Winer (1971) have suggested that the some- 
times spuriously high levels of variability encountered in experimental work in the 
behavioural sciences may be avoided by separating the effects of differences between 
subjects which exist prior to the experiment, by use of repeated-measurement 
procedures. With repeated measurements of each subject’s responses over multiple 
interventions, all subjects serve as their own controls because measurements are 
based upon the deviations from the mean of each subject’s responses. Thus the 
performance of each subject is measured in such a way that ‘‘variability due to 
differences in the average responsiveness of the subjects is eliminated from the 
experimental error" (Winer, 1971, р. 261). The results obtained from such designs 
are therefore more reliable than those obtained from single-intervention procedures 
(Baer ег al., 1968; Hanley, 1970; Namboodiri, 1972). Following this, an 
experimental design in which repeated measurements were collected upon reading 
accuracy, speed and comprehension was constructed and is depicted in Figure 1, 
which shows that intervention and data-collection occurred on four days per week 
for five weeks, preceded by two weeks of baseline when probes were administered, 
but no feedback given. There were only three days of intervention during the fourth 
week of intervention owing to this being Easter and the schools having two days' 
vacation. 


FIGURE 1 
DESIGN OF THE STUDY 
CONDITIONS 
Ре Ваве Intevenion Postes. 
GAP Probes collected on 8 days Probes collected on 19* days GAP 


St. Lucia (2 weeks х 4 days/week) for (5 weeks x 4 days/week) for St. Lucia 
accuracy, speed, comprehension accuracy, speed, comprehension 





* One week = Easter, with 
three days' data only 


Treatments 

(1) EMG biofeedback: Previous studies by Carter and associates which are 
mentioned above and which used EMG biofeedback as a treatment for reading 
disability focused upon teaching relaxation of the preferred forearm, and this 
procedure was followed here within the overall time-series design. For intervention 
EMG biofeedback consisted of a visual display in units from 1 to 10 upon a dial 6.5 
cm wide. Electrodes were attached to the preferred forearm (as used in the previous 
studies) and the procedure described below is closely similar to that used by Carter 
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and associates. Sessions were of approximately six minutes (5 minutes EMG, 1 
minute reading probe) each day with EMG training procedure as follows: 


(1) Baseline, 45 secs: subjects were told to sit quietly 
without fidgeting or unnecessary 
activity. 

(2) Training with feedback, 180 secs: subjects were told to decrease the 


EMG level as much as possible. 
Exact instructions were to ‘‘keep the 
needle as low as you can’’. 

(3) Transfer with no feedback, 15 secs: subjects were told to ‘‘keep the 
needle as low as possible’’. 


(4) Rest, 15 secs: subjects were told to ‘‘sit quietly" 
without trying to ‘‘do’’ anything. 

(5) Baseline, 45 secs: as above; readings recorded each 15 
seconds. 


After initial experimentation, children realised that conscious attempts to lower the 
needle (e.g., by clenching fists) did not work, and learnt to sit quietly and use their 
own means of relaxation. With practice, all children were able to lower their EMG 
level at will. While this capability does not necessarily result in enhanced overall 
relaxation, the question of generalisability to other muscle groups is not of concern 
here. It was the purpose of this part of the study only to test whether teaching 
children to control voluntarily the EMG levels of their preferred forearms would 
lead to improvements in their reading performance, and thus it was sufficient to 
record that children had learnt this task, albeit with individual subject variations. 
The index of change was set as a reduction of not less than 5 on the visual scale of 
1-10 from readings collected during the initial baseline phase of each day's session 
for each child. While some children were able to reduce the visually recorded level of 
EMG activity by more than 5, this was the minimum to which all children were 
trained and constitutes the “ЕМС reduction" criterion used here. It should be noted 
that each child's reduction was in terms of their own baseline levels, thus 
individualising treatment in repeated measures fashion. Reporting of microvolt 
reductions by the entire group does not allow for individual variability in muscle 
tension which is typically present between subjects. 


(2) Relaxation training: As shown in Table 1, children under this treatment 
received relaxation training in two groups of four and one pair. One child from the 
fours left the school and did not complete the study, thus N=9 for this treatment. 
Relaxation training consisted of 10 minutes progressive muscle relaxation 
(Jacobsen, 1938) during which the experimenter read from a standard relaxation 
script. Once again, this procedure is either identical or very similar to that used in 
the previous literature cited above. 


(3) Remedial teaching: So as to maximise the effects of this intervention, a 
remedial teaching programme which was different in method and content from 
normal classroom teaching in these schools was devised. Van Houten (1980) 
emphasised the importance of feedback in the learning process. It was evident from 
discussions with teachers and observations of their classrooms that this aspect of 
teaching was not emphasised in normal classroom procedure in these schools. 


Consequently, each child was equipped with a workbook which contained 
graphs of reading rate, accuracy and comprehension. The use of these graphs in 
conjunction with reading ‘‘probes’’ is suggested as an effective method of teaching 
reading in special education settings (Van Houten, 1980, pp. 129-132). A similar 
method was used by Linfoot (1979) who incorporated ‘‘minimum 'celeration lines" 
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and goal setting procedures. These procedures have been used to increase accuracy 
of word recognition and integrated with other daily measures with beneficial results 
(Linfoot, 1979, pp. 184, 187). A combination of these methods was used in this 
study to define goals, provide feedback on daily reading performance, increase 
motivation and increase performance on criteria. 


The group work focused on the remediation of phonics and comprehension 
skills as well as reading speed. Again the difference was evident between this and 
normal classroom teaching. Group lessons consisted of 10 minutes of instruction in 
phonic skills, 20 minutes of comprehension exercises and 20 minutes of phonics 
exercises each day of the week. 


The teaching of phonics involved the presentation of sound and letter groups 
followed by a short discussion of these groups. The remainder of the 10 minutes of 
teaching was taken up with formulation of word lists containing the letter/sound 
groups in initial, medial and final positions. The word lists were made up of words 
suggested and known by the students. Formulation involved spelling the words and 
discussing meaning as well as pinpointing the phonic letter group being emphasised. 


This activity was followed up by comprehension and phonics exercises, using 
graded exercise cards. Each child worked through the cards and presented their 
written work in the exercise books, which was then marked and handed back with 
performance graphs brought up to date. Criteria for progress to another set of cards 
was 90 per cent accuracy. 


The apparent discrepancy in time between these three treatments requires 
comment. The studies by Carter and associates which used both EMG biofeedback 
and relaxation training (Carter ef а/., 1978; Carter, 1979; Carter et al., 1979) 
administered each for 10 minutes only. To allow for maximum replication, a similar 
time period was used for these two treatments in the present study. Similarly, the 
remedial teaching group received 10 minutes of instruction and the remainder of the 
period was practice. (Because of timetabling restrictions within this school, the 
experimenter was not able to limit the total contact time per day to only 10 minutes.) 
All treatments were thus similar in time allocated for individual instruction, with the 
remedia] teaching group receiving time for practice with the experimenter. 


(4) Control groups: 

(1) Daily measures only. This group received the daily measure of reading 
performance (‘‘probes’’), individually administered to all subjects during both 
baseline and intervention phases of the study. Data were recorded, but no feedback 
given to subjects. 


(2) Pre- post-testing only. Apart from pre-test and post-test measures, 
classroom activities were as normal except that, by arrangement with principals, 
parents and teachers, none of these 49 children received any remediation for reading 
during the period of the study. 


Following the completion of the study, both authors conducted a series of 
workshops for teachers in these schools to equip them with the skills and knowledge 
to perform the remedial teaching programme used in this study within their own 
classes. In addition, several of the most severe cases of reading disability were 
diagnosed and treatment specified. All teachers who attended these workshops 
expressed their gratitude. 


RESULTS 
Intervention data 
In order to test for changes from baseline to intervention, time-series analyses 
were performed on the group mean scores for each day for each of the three 
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treatments and the daily measures only control group. Time-series analyses are 
particularly relevant when testing if changes over intervention are significant. As 
explained in detail elsewhere (e.g., Glass er al., 1975; Jones et al., 1977), presence of 
correlation between consecutive data points (‘‘serial correlation") can lead to 
confounding effects when traditional data-analysis procedures are used (Gottman 
and Glass, 1978; Sharpley and Rogers, 1981). Time-series procedures calculate these 
autocorrelation effects and allow for them when producing ''t" statistics for 
changes in level (i.e., mean changes in performance) as well as slope (i.e., alteration 
in the direction of data trends over phases). 


Means of the three reading probe measures (accuracy, rate, comprehension) 
from Baseline and Intervention across treatments are presented in Table 2. As 
mentioned above (Figure 1) these means represent 8 (Baseline) and 19 (Intervention) 
separate data collection points, all of which were fed into the time-series analysis so 
that slope (i.e., changes in the direction of data trends over time) as well as level 
changes could be tested. The results of the time-series analyses of level and slope 
changes for accuracy, rate and comprehension are shown in Table 3. 








TABLE 2 
MEANS! FOR READING PROBE DATA ACROSS BASELINE AND INTERVENTION 
Variable Treatment 
Remedial Daily 
EMG Relaxation Teaching Measures 

Accuracy (per cent correct) 

Baseline 82-063 89:787 90-900 90-862 

Intervention 77-764 85:316 92-386 85-458 
Rate (words/minute) 

Baseline 56-037 57:475 65-587 60-100 

Intervention 48:942 55-516 65-021 54-963 
Comprehension (per cent correct) 

Baseline 37-500 42:625 47-500 45-125 

Intervention 19-263 26-210 42-578 26-684 


+ Mean scores as provided here do not necessarily reflect ‘‘slope’’ changes. These 
are calculated by comparison of all data points in each phase of the study (i.e., 
baseline and intervention) to detect trends in those data. 











TABLE 3 
TIME-SERIES ANALYSES OF READING PROBE DATA BASELINE TO INTERVENTION 
Variable Treatment 
Remedial Daily 
EMG! Relaxation! Teaching! Measures! 
p p p p 
level —1:65 —2:53* 2-35* — 1-49 
Accuracy slope 1-69 3.43* 3-25" 1-86 
Rate level —2:73* -1:54 —0-05 -3:23* 
slope 1:22 0-49 —0-05 0-68 
А level —1-39 -1:21 —0:37 — 1-49 
Comprehension slope 3-24" 5-13* 10-64* 3-29* 
*Р<0.05 @=27 @@{=28 


+ The t statistic produced by time-series analysis can be tested for significance in the same manner as 
traditional t-tests. 


Й 


А 
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While there were increases on both level and slope for accuracy under the 
remedial teaching treatment, a decrease in accuracy (level) was noted for those 
subjects receiving relaxation training. However, the trend was for this to improve, as 
shown by the significant increase in slope data for the relaxation group. It may be 
inferred that after an initial drop in accuracy, relaxation training subjects began to 
increase their accuracy over time. Because this study was limited by logistics to one 
school term of 10 weeks, the suggested long-term efficacy of relaxation training 
remains only a surmise. 


The rate of reading showed no positive change over any treatment, with two 
significant decreases in the speed of children’s reading under EMG biofeedback and 
the daily-measures control group. Since neither of these decreases in speed was 
associated with increases in accuracy (a relationship which may have been inferred 
as associated were it present), these two significant decreases can only be seen as 
performance decrements. 


Comprehension showed significant trends towards improvement for all 
treatments and, although similar changes were not noted in level data, these trends 
argue for a practice effect here in terms of subjects becoming more alert to the 
presence of comprehension questions following each probe. Even without feedback 
as to the accuracy of subjects’ answers, the expectations that these questions would 
be asked could have encouraged more careful attention to the facts mentioned in 
each passage read. Practice effects due to familiarity of content may be ruled as 
unlikely since this would also have resulted in higher accuracy and speed data. 


In summary, data collected during the baseline and intervention phases of this 
study do not support Hypothesis 1 concerning the effectiveness of both coping 
procedures for accuracy and speed of reading. Direct action (i.e., remedial teaching) 
showed significant positive effects upon accuracy of reading while only one of the 
palliative procedures — relaxation training — showed a trend towards improvement 
over time on this variable. The other palliative procedure — EMG biofeedback — 
has not been demonstrated as effective for accuracy or speed within this study. All 
treatments were effective in alerting children to comprehension questions. The 
presence of these effects for the control (daily measures only) group suggests that 
this improvement cannot be inferred as a result of any of the treatments per se, but 
rather is due to the measurement procedures used. 


Pre- and post-test measures 

Although not suggested as equally reliable, these data constitute both the only 
before/after comparison on standardised instruments, as well as being similar to the 
WRAT pretest/post-test measures used by Carter and associates, thereby enabling 
comparisons to be drawn between the previous studies and this research. 


Analyses of variance upon post-test data for both the GAP and St. Lucia, with 
pretest differences covaried out of the analyses, indicated that there were no 
significant differences at the time of post-testing which could be attributed to 
treatment alone. (GAP: F— 0-645, df=4, 48, NS; St. Lucia: F = 0-901, df —4, 48, 
NS.) 


DISCUSSION 


The present study was designed to investigate the relative effectiveness of direct- 
action vs palliative treatments upon reading performance. Results from previous 
studies were not replicated, and no significant improvements were noted for EMG 
biofeedback. To the contrary, this treatment showed a significant decrease in 
reading speed and no significant change in accuracy. The alternative palliative 
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treatment (relaxation training) produced more supportive data, with an immediate 
significant decrease in accuracy when treatment was begun, followed by a trend 
towards some long-term gain in accuracy. Only remedial teaching, tuition with 
feedback, showed significant improvement in accuracy. No treatment improved 
reading speed, and comprehension improvements cannot be attributed to any one of 
the treatments. Because of logistic restrictions which necessitated the organisation of 
treatments and schools shown in Table 1, these results are open to the criticism that 
treatment may have been confounded by school. For example, the remedial teaching 
treatment may have had different effects in other classrooms and schools, as may 
the two relaxation strategies. While this possibility of confounding is present and 
calls for caution when interpreting results as far as comparisons of treatments are 
concerned, it represents one of the restrictions which are imposed by performing 
applied research, and data still remain valuable as indicators of change or not 
despite this limitation. 


While these data are in opposition to those from some previous studies, it is 
suggested here ‘that the relative effectiveness of remedial teaching vs EMG 
biofeedback vs relaxation training as treatments for reading disability was not 
adequately addressed in the studies performed by Carter and associates. In 
particular, previous data were gathered upon pre- vs post-test measures only, with 
the use of gain scores and are thus questionable. Of interest is the finding herein that 
this form of measurement produced results in disagreement with those from the 
repeated measures time-series design and analyses even within the present study, and 
should be treated as less reliable than the use of daily probes and statistical analyses 
designed for repeated measurements upon the same subjects and variables. While 
standardised tests such as those used in this and the other studies cited earlier are 
designed to assess a child once, or yearly at most, they are not designed for repeated 
use or as a pre- vs post-test measure, and were used as such in the present study for 
purposes of replication rather than reliable evaluation of change. Measurement 
errors which may occur when standardised tests are used to detect change can 
include regression to the mean effects which may lead to incorrect conclusions 
regarding change in performance or effectiveness of an intervention. For these 
reasons data collected via repeated observations of actual reading performance (i.e., 
the reading probes used here) are a far more reliable procedure for assessing the 
effectiveness of an intervention where small but real (and important) changes may 
occur. 


The reduction of stress which has been suggested as accompanying muscle 
relaxation may in fact work against improvement in reading. Welford’s (1974) 
postulation of an optimum level of stress, departure from which reduces 
performance, argues for the retention of action-producing stress. If, in the course of 
a treatment to reduce muscular tension, stress regarding reading performance was so 
reduced that reading became a low-priority activity, then children may have declined 
in their efforts to read accurately and faster. Alternatively (and more 
parsimoniously) the focusing of children’s attention upon muscle-relaxation rather 
than reading could have also reduced the amount of attention given to reading. In 
effect, children were learning an alternative rather than a correlated task, with 
predictable results. It may be suggested that replacing reading practice and tuition 
with any other task will decrease reading performance merely because the skills of 
reading are not being reinforced. If relaxation training or EMG biofeedback are 
justified as treatments for reading disability, then it is as ancilliaries to remedial 
teaching. The relative effectiveness of remedial teaching with and without these 
palliative procedures needs to be further investigated. Of further interest for future 
research is the issue of the relative effectiveness of the nature of the remediation 
(i.e., phonics as used here or alternative orientations) plus the amount of time 
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allowed for practice. Both of these variables were controlled in the present study so 
as to allow more direct replication of previous research, but it may be that focusing 
remedial teaching upon phonics led to the increase in accuracy which was an 
outcome of that treatment. While this is to be expected (and hoped for) a 
comparison of various remedial teaching orientations with relaxation strategies 
would allow more informed evaluation of this issue. In addition, manipulation of the 
time allowed for practice deserves further attention and is presently under 
investigation by the authors. 


A number of points emerge which have implications for training of teachers 
and provision of special services to the learning disabled in schools. Use of 
relaxation training should be seen as secondary to direct-action intervention as 
suggested by data-based instruction (e.g., Haring and Bateman, 1977; Van Houten, 
1980). Relaxation is certainly of general benefit, a finding too well established to be 
challenged by these data. However, it is suggested that relaxation via symptom- 
alleviation or palliation is not as effective for the treatment of reading disability as 
the direct action of teaching what has not been previously learnt. It may be that such 
direct action has a final stress-reduction effect which is superior to palliation, a topic 
for future study. The profusion of relaxation training procedures for use by 
educators and other personnel which has emerged over the past decade needs to be 
seen in perspective as an adjunct to cure rather than a cure in itself, at least for 
reading disability. The worth of ‘‘humanising’’ education processes is low when 
compared to removal of the source of anxiety which has led to stress, i.e., the threat 
of future failure at a specific task. There may indeed by a causal link between failure 
and stress, but the present study (contrary to some previous data) suggests that this 
link begins with task-failure rather than with stress, thereby constituting a challenge 
to effective teaching. Piaget suggested that education ought to be a process of 
providing the learner with a ‘‘mild challenge". Reduction of the level of this 
challenge so that it is negligible may simply result in negligible learning. 
Alternatively, reduction of inflated levels of challenge which have become 
threatening may be best achieved by removing the threat of failure rather than the 
symptoms which follow that threat. 


Requests for reprints should be addressed to Dr. C. F. Sharpley, Faculty of Education, 
Monash University, Clayton, Victoria 3168, Australia. 
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THE ROLE OF GENERALISED EXPECTANCIES IN 
DETERMINING CAUSAL ATTRIBUTIONS FOR SUCCESS 
AND FAILURE IN TWO SOCIAL CLASSES 


By RUTH BUTLER 
(School of Education, Hebrew University of Jerusalem) 


Summary. The sources of discrepancies in causal attributions for success and failure and 
for self and other attributions of outcome may become clearer when specific research 
attention is paid to the role of generalised expectancies in the attributional process. The 
present study hypothesised that when information about task and outcome is 
standardised, the attributions of both actors and observers will tend to reflect generalised 
expectancies, evoked in this study by two levels of SES in both actor and observer 
conditions, more than they will self-serving biases. 230 sixth grade Jewish Israeli pupils of 
two SES levels were assigned to one of three experimental conditions. All pupils received 
eight anagrams, four soluble by all and four insoluble. Those in the self-attribution 
condition attributed their own outcomes to various causes, while those in the similar and 
different other conditions attributed the same outcomes for pupils of apparently similar 
or different social class, after having first completed the anagrams themselves. The results 
indicated that while pupils of high SES tended to attribute both their own and others’ 
outcomes in ways consistent with high generalised expectancy for success, pupils of low 
SES attributed their own outcomes more to external, unstable factors, and differentiated 
consistently in their attributions for the self, for similar and for different others. It was 
argued that these differences reflect undifferentiated, global perceptions of causality 
among high SES pupils, whose patterns of attribution are consistent both with teacher 
values and their own experience. Low SES pupils have more differentiated perceptions of 
causality since uncertainty as to the real causes of their learning outcomes motivates them 
to greater, but not always adaptive, attributional activity. 


INTRODUCTION 


Tue central role of causal attributions in determining perceptions and behaviour has 
been widely recognised. While the relevant literature is vast, much has been 
concerned with investigating the existence and sources of biases in the attributional 
process. Two main interpretations have been offered for the commonly reported 
assymetries between self-attributions of success and failure outcomes, whereby more 
responsibility is accepted for success than for failure (Stevens and Jones, 1976) and 
between self and other attributions, whereby actors tend to make more situational 
attributions than do observers (Jones and Nisbett, 1971). One direction has been to 
understand these assymetries as deriving from ego-enhancing and ego-defensive 
motivational biases (Bradley, 1978). The second approach (Ross, 1977) argues that 
such assymetries can be parsimoniously explained in cognitive, information 
processing terms. 


Ross (1977) suggests that discrepancies between self and other attributions may 
reflect the fact that the former are more accurate because they are based on greater 
information. Actors have access not only to the information provided by direct 
experience with the experimental task but also to that provided by their own past 
outcomes in similar situations. It can be argued that similar reasoning may also 
explain the reported discrepancies between self-attributions for success and failure, 
since these have been observed largely with college students who may well have had 
more experience with success than with failure outcomes. Already in 1967, Kelley 
hypothesised that behaviour that is consistent with some generalised expectancy will 
be attributed to stable properties of the actor, while unexpected outcomes will be 
attributed to temporary and/or situational factors. If success is indeed a more 
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expected outcome than failure among most college students, this can explain why 
success outcomes will be attributed, on the basis of past experience, to factors such 
as ability, while there will be a greater tendency to seek situational information to 
explain failure. This tendency may be especially marked in experimental situations, 
since, as Monson and Snyder (1977) note, most studies of actor attributions have 
been carried out in settings where situational forces were dominant. 


This discussion highlights the role of generalised expectancies, whose cardinal 
importance in influencing behaviour in a wide variety of settings has been amply 
demonstrated, (Jones, 1977; Rosenthal and Rubin, 1978). Thus it is somewhat 
surprising that this variable has to date received relatively scant attention in the 
context of attributional processes, where most research interest has focused on the 
manipulation of specific experimental expectancies. One exception is Deaux’s (1976) 
treatment of gender as a variable generating different patterns of expectancy and 
attribution. She concluded that the research on sex differences in patterns of causal 
attribution among adults indicates that the success of men and failure of women, 
which are expected outcomes according to prevailing social stereotypes, tend to be 
attributed by both actors and observers to ability, while the unexpected outcomes of 
male failure and female success are attributed to effort or luck. 


Further light on the role of generalised expectancy may be shed by investigating 
another variable associated with differential histories and expectancies of success — 
socio-economic status (SES). Work within the social learning tradition has 
established fairly consistent SES differences in locus of control, with subjects of 
high SES tending to adopt a more internal and those of low SES a more external 
locus of control (Battle and Rotter, 1963; Milgram, 1971). In addition, Gurin et al. 
(1969) found that blacks who expressed an external locus of control when studying 
their own outcomes tended to perceive the outcomes of others as determined by 
internal causes. In addition, many blacks attributed their failures specifically to 
societal pressures (‘‘system blame’’) rather than to vaguer causes such as luck. 


These findings suggest that the external attributions of certain social groups 
may reflect an accurate and differentiated social perception of their own limited 
opportunities in the face of objectively powerful external forces, rather than some 
generalised world view which sees little relation between behaviour and outcomes in 
a basically incomprehensible and unpredictable world. More specifically, they invite 
the hypothesis that both actors and observers may evaluate the determinants of 
outcomes at least in part according to generalised expectancies as to what constitutes 
an expected outcome for a particular social group. Thus within the two-dimensional 
model proposed by Weiner ef ai. (1971), one can hypothesise that the expected 
outcomes of high SES success and low SES failure will be attributed more to stable, 
internal causes than will the unexpected outcomes of low SES success and high SES 
failure. 


The role of SES has been largely neglected by attribution theorists, and the 
results of the few relevant studies are inconclusive. Thus Friend and Neale (1972) 
found the self-attributions of white fifth graders to be more internal after failure, 
but not after success, than those of blacks, while there were no effects for SES. 
Raviv et al. (1980), using an Israeli sample of the same age, found effects for SES, 
but not for ethnic origin, after failure. Interpretation of these results in terms of 
their contribution to an analysis of generalised expectancy is further complicated by 
the fact that they were limited to actors alone, and the range of SES in the Friend 
and Neale study was limited (low and lower- middle). In addition, it is difficult to 
compare their results with those of other studies showing assymetries in the causes 
proposed by Weiner ef al. (1971), since Friend and Neale analysed mainly the 
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internal-external dimension, while Raviv et al., treated nine rather than the usual 
four causes. 


The foregoing discussion suggests that while SES, as a variable generating 
different histories and stereotypes vis-à-vis achievement, should yield patterns of 
attribution which may further understanding of the role of generalised expectancy in 
attributional processes, no studies to date have yet treated such generalised 
expectancies as the major experimental variable. The effects of this dimension 
should emerge more clearly when both actor and observer attributions are studied 
within a single design, and when efforts are made to reduce both the salience of 
specific situational cues, and the usual discrepancies between the task information 
available to actors and observers. These considerations suggested the following 
experimental design. Subjects of two SES levels were all exposed to the same success 
and failure experiences and asked to attribute outcomes to four factors — ability, 
luck, effort and task difficulty. Actors were asked to attribute causes for their own 
outcomes. In order to standardise the information about the task available to actors 
and observers, the latter themselves experienced success and failure before 
attributing identical outcomes in another. Background information was manipu- 
lated so as to create the impression that observers were being asked to attribute the 
outcomes of another pupil either similar to (in terms of SES) or different from them- 
selves. Subjects were sixth grade pupils since at this point locus of control seems 
fairly stable (Milgram, 1971) and the review by Cooper et al., (1981) suggests that 
there are as yet no sex differences in patterns of attribution, an advantage for a 
study concerned primarily with the effects of SES. 


Such a design enables the formulation of several specific hypotheses as follows: 


(1) The self-attributions of advantaged pupils will be primarily to ability after 
success and to effort after failure, more than will those of disadvantaged 
pupils. 

(2) Disadvantaged pupils will attribute their own success primarily to unstable 

factors (luck, effort) and failure to external factors (luck, task difficulty) 

more than will advantaged pupils. 

Pupils of both SES levels will attribute the expected outcomes of an 

advantaged other’s success and a disadvantaged other’s failure mainly to 

stable, internal properties of the actor, and the unexpected outcomes of an 
advantaged other’s failure and a disadvantaged other’s success to transient 
and/or situational factors. 

(4) Standardisation of information about the task will reduce the usually 
received differences between self and similar other attributions at both 
levels of SES. 


(3 


— 


METHOD 

Subjects 

Subjects were 230 sixth grade Jewish Israeli pupils (mean age 11, 9; 118 boys 
and 112 girls), attending four schools which served either predominantly disadvan- 
taged or predominantly middle-class populations, but which also had a sizeable 
minority of pupils of different SES. Such schools were chosen in preference to 
completely segregated or integrated ones in order to provide subjects who, while 
clearly advantaged or disadvantaged, would also have some contact with pupils of 
other social groups and would attend schools not too different in educational level. 
Within each kind of school two classes were randomly assigned to the self- 
attribution experimental conditions, and the remaining classes were assigned to the 
other-attribution conditions. Since there were differences in both the size of classes 
and in the representation of different SES levels within them, pupils were randomly 
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selected on the basis of SES from the classes assigned to each condition so as to 
create two groups of 45 subjects (the High and Low SES self-attribution conditions) 
and four groups of 35 subjects (the High and Low SES similar and different other 
attribution conditions). Thus all groups included some subjects from each school, 
and similar numbers of boys and girls. There were 115 pupils at each level of SES, 
defined as low if both parents had fewer than eight years of schooling and as high if 
both had more than twelve. 


Experimental manipulations 

Outcome. Four soluble and four insoluble anagrams made up the success and 
failure conditions respectively. The anagrams were chosen from 20 previously given 
to pupils in a heterogeneous sixth grade class. The soluble anagrams chosen were 
those which all pupils solved, and which were rated as fairly (rather than very) easy. 
The insoluble anagrams were those which were rated very difficult (rather than 
insoluble). 


The anagrams were presented in random order both within and between 
subjects. In order to further disguise the manipulation, after each anagram subjects 
were presented with two alternative sets of causes for outcome, one to be ranked in 
the event of success and the other in the event of failure. 


Self and other attributions. Classes in the self-attribution condition were given 
a questionnaire consisting of the eight anagrams, each on a separate page. After 
attempting to solve each anagram, pupils were asked to rate separately on a 7-point 
scale the degree to which their success or failure was due to each of the four causes. 
These causes, with the success alternative presented first in each pair, were as 
follows (translated from the Hebrew). 


— Ability: Because I am generally good at solving problems like 
this. 
Because I am generally not good at solving problems 
like this. 

ты Task Difficulty: | Because the problem was easy. 

Because the problem was difficult. 

— Luck: Because I was lucky and saw the answer. 
Because I was not lucky and didn’t see the answer. 

— Effort: Because I tried hard. 
Because I didn’t try hard enough. 


In the observer conditions, pupils were given two questionnaires, one including 
only unsolved anagrams, and ‘the other, filled in previously by the experimenter, 
which had apparently been completed by another child. This included background 
information about the child, anagrams with appropriate correct or incorrect 
solutions, and uncompleted sets of causes for rating. Subjects were asked to read the 
details the child had filled in and to try and imagine what the pupil was like. After 
attempting each anagram on the empty questionnaire, pupils were asked to turn to 
the same anagram on the completed questionnaire, to see whether the other pupil 
had succeeded or failed to solve it, and then to rate the reasons for his/her outcome 
as described above. Each pupil was given a questionnaire apparently completed by a 
pupil of the same sex and grade. 


Similar and different other. These conditions were created by the background 
information given on completed questionnaires. The details given about the other 
child — name, address and father’s country of origin and job — were designed to 
encourage perceptions of the child as coming from a successful, advantaged family 
of European origin or from an unsuccessful, disadvantaged one of North African 
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origin, potent Stereotypes within Israeli society. In the similar other condition pupils 
were given questionnaires of children apparently from a background similar to their 
own, while in the different other condition the other child was apparently of a 
different social background. 


Scoring 
Final scores for attributions to each cause consisted of mean attributions for the 
four success and four failure experiences computed separately. 


Procedure 

The experiment was conducted in each class during regular school hours by one 
of three female graduate students. The experimenter explained that she was 
interested in testing a new word game: she needed to see how children solved the 
words in the game and what they felt contributes to success in such games. AII pupils 
were then given a questionnaire with unsolved anagrams and told that they would be 
given two minutes to sort each set of jumbled letters into the original word. Pupils in 
the actor condition were told that after the two minutes they would be asked to 
estimate what had helped or hindered them in solving the anagram. These 
instructions and the rating system (from 7 if the reason was considered very 
important to 1 if it was considered to be not at all important) were then clarified with 
an example. Pupils were then told to begin the first anagram, and after two minutes 
to stop and complete the appropriate ratings. This procedure was repeated for the 
remaining seven anagrams. 


In the observer conditions children were given the two questionnaires described 
above. Instructions about the anagram task were as above. Pupils were then told 
that the second questionnaire had been filled in by another child, and were asked to 
read the details about him/herself which the child had filled in on page one and to 
try and imagine what kind of child he or she was. The experimenter was interested in 
knowing what, in their view, had helped or hindered this child in solving the 
anagrams. Thus after completing each anagram the pupils would be asked to turn to 
the same question in the other child's booklet, to see whether or not the child had 
solved the anagram, and then to rate reasons which had caused him/her to succeed 
or fail. The rating system was then explained and illustrated as described for the 
actor condition. The procedure was then as described above, except that after 
working on each anagram for two minutes pupils were reminded to turn to the other 
child's booklet and explain why he/she had succeeded or not succeeded in solving 
the same anagram. It was emphasised that they should explain the other's outcomes, 
not their own. 


On completion, classes were encouraged to talk about the anagrams, with the 
main aim of establishing whether pupils were aware of the fact that certain tasks 
were actually insoluble, and whether they believed that the **other child" was a real 
person. 


RESULTS 


The data were first analysed using successive 2 x 2 x 3 x 2 analyses of 
variance for attributions to each of the four causes, with sex, SES and attributor 
condition as between and outcome as a within factor. Since there were no significant 
main or interaction effects for sex, data for the sexes were combined and reanalysed 
using successive 2 x 3 x 2 analyses of variance. Means for each cell of the design 
are presented as Table 1 and the results of the four analyses of variance as Table 2. 
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TABLE 1 
MEAN ATTRIBUTIONS TO EACH CAUSAL FACTOR IN EACH EXPERIMENTAL CELL 














High SES Low SES 
Similar Different Similar Different 

Causes Self Other Other Self Other Other 
After success 

Luck 2-07 2-17 2:23 4-37 4:77 2:69 

Task Difficulty 3-70 3-82 3-52 2-90 4:80 3:38 

Effort 3-05 3-53 3-27 4-99 3 73 3°25 

Ability 5:65 5:30 5-19 2-56 2-50 5:21 
After Failure 

Luck 2°72 2-59 2°86 5:05 4:49 3.33 

Task Difficulty 4°51 3:75 3-87 4:15 5-09 3-94 

Effort 5-00 5-08 5:16 3:36 4-34 4°58 

Ability 1-79 2°41 2-44 2:04 3-98 2:22 

N 45 35 35 45 35 35 

TABLE 2 
SUMMARY OF EFFECTS OF ANALYSES OF VARIANCE FOR EACH CAUSAL FACTOR 
Causes 

Factors df Luck TD Effort Ability 
SES I 78.95** 0-97 0-33 39-98** 
Attributor Condition 2 6-39* 7-19** 0:17 14: 14** 
Outcome 1 40-51** 31-86** 62: 50** 149-42** 
SES x Att. Cond. 2 9-27" 9-68** 1-02 11:53** 
SES x Outcome 2 1:77 5-30 69-70** 65-03** 
Att. Cond x Outcome 2 6: 56* 9-51** 13-60** 14-75** 
SES x Att. Cond 
x Outcome 2 2:52 0-05 19: 89** 19:21** 





*P «0-05 **P « 0-001 


Attributions to ability 

Highly significant main effects of SES, attributor condition and outcome were 
qualified by a significant triple interaction of SES x Condition x Outcome. Figure 
1, which depicts this interaction graphically, indicates that pupils of high SES 
attributed success, but not failure, to ability. While this tendency was strongest in 
the self-attribution condition, attributions to ability were generally similar across 
self, similar other and different other conditions within each level of outcome. 
Among pupils of Low SES, on the other hand, ability was usually perceived as a less 
potent determinant of both success and failure. Moreover, there were also clear 
differences in attribution to ability between the attributor conditions. After success, 
ability was perceived as a major determinant of the success of different (i.e., High 
SES) others, but not of that of the self or a similar other. In fact, the mean rating for 
ability was similar to that received from High SES pupils. After failure, on the other 
hand, attributions to ability were highest in the similar other condition, and similar, 
and low, in the self and different other conditions. 


Attributions to luck 

The analysis of variance yielded a significant main effect for SES, with low SES 
pupils making stronger luck attributions after both success and failure. In addition, 
the significant effect for outcome, together with the non-significant Sex x Outcome 
interaction indicates that pupils at both levels of SES made stronger luck 
attributions after failure than after success. The significant main effect for 
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FIGURE 1 


ATTRIBUTIONS TO ABILITY IN SELF, SIMILAR OTHER AND DIFFERENT OTHER CONDITIONS 
BY SES AND OUTCOME 


— high SES 
+++ low SES 
e Success 
a failure 





self similar other different other 


attributor condition was qualified by significant SES x Condition and Condition х 
Outcome interactions. These results indicate that for luck, as for ability, there were 
no differences in the attributions of High SES pupils to the self, similar or different 
others. Among low SES pupils there were differences, with the outcomes of the self 
and similar others being attributed more to luck than those of different others, after 
both success and failure. 


Attributions to task difficulty 

Significant main effects for condition and outcome, but not for SES, were 
qualified by significant interactions of SES x Condition and Condition x 
Outcome. While attributions to task difficulty tended to be higher after failure than 
after success in both SES groups, differences were again noted in the degree to which 
ratings distinguished between the attributor conditions at each level of SES. Among 
high SES pupils there were again no differences in their attributions to task 
difficulty after success, but after failure task difficulty was perceived as a more 
potent determinant in the actor than in the two observer conditions. Among Low 
SES pupils task difficulty was rated highest in the similar other condition, after both 
success and failure. While attributions within the similar and different other 
conditions were fairly similar after success and failure, in the actor condition task 
difficulty was seen as fairly irrelevant to success, but as importantly determining 
failure. 
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Attributions to effort 

A significant main effect for Outcome, but not for SES or Attributor 
Condition, was qualified by the significant triple interaction of SES x Condition x 
Outcome, which is depicted as Figure 2. As revealed by Figure 2, this effect, together 
with significant interactions of SES x Condition and Condition x Outcome 
derived mainly from the finding that, while effort was perceived as a more 
important determinant of failure than of success in all High SES attributor 
conditions and in the two Low SES observer conditions, Low SES actors perceived 
effort as more determining of their own success than of their own failure. 


FIGURE 2 


ATTRIBUTIONS TO EFFORT IN SELF, SIMILAR OTHER AND DIFFERENT OTHER CONDITIONS 
BY SES AND OUTCOME 


— high SES 

low SES 
е success 
E failure 





self similar other different other 


These results, while complex, can be summarised as follows. First, there were 
differences at both SES levels in attributions after success and failure. Second, the 
self-attributions of High SES pupils after both success and failure differed from 
these of low SES pupils. Third, High SES attributions to the self, to a similar and to 
a different other were generally similar, while among Low SES pupils these differed 
across all causes. These trends were then clarified, using additional analyses where 
necessary, as they bore on the specific hypotheses proposed by this study. 


SES differences in self-attributions for success and failure 

It was hypothesised that High SES pupils would tend more than Low SES 
pupils to attribute both success and failure primarily to internal causes, favouring 
ability after success and effort after failure. Examination of Table 1 indicates clearly 
that, as hypothesised, High SES pupils attributed their own success primarily to 
ability and least to luck, while Low SES pupils attributed success to effort and luck 
and least to ability. After failure, neither group attributed outcomes to lack of 
ability, but while High SES pupils attributed failure primarily to effort and task 
difficulty Low SES pupils attributed failure to luck followed by task difficulty. 
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It was further hypothesised that all actors would accept more responsibility for 
success than for failure. In order to test this hypothesis, a measure of internality of 
attributions was computed by subtracting the sum of task difficulty and luck 
attributions from that of ability and effort attributions (cf. Arkin et al., 1981). 
Student ‘t’ comparisons for paired observations within each level of SES revealed, 
as expected, that the attributions of High SES pupils were internal after success 
(M = 2:83) but not after failure (M = —0-44, t = 12:35, P < 0-001. Those of 
Low SES pupils did not support the hypothesis, since they were neither internal 
or external after success (M = 0:44) but were markedly external after failure 
(M = —4-74), t = 16:34, P < 0-001. In other words, while High SES pupils took 
credit for success and shared responsibility for failure with external causes, Low SES 
pupils shared responsibility for success while locating failure outside themselves. 


Attributions to the self and to similar and different others 

The relevant hypotheses suggested that; (1) commonly found differences 
between actor and observer attributions would not be received in present 
comparisons of self and similar other attributions, since observers of similar others 
were provided with more information than usual about both the observer and the 
task; (2) observers would rate the outcomes of similar and different others 
differently in accordance with stereotyped perceptions as to what constitutes an 
expected outcome for a member of a particular social group. Thus the outcomes of a 
High SES other would be attributed by pupils of both SES levels to factors 
consistent with generalised expectancy for success (to ability after success and effort 
after failure), while those of a Low SES other would be attributed to factors other 
than ability after success, and to lack of ability after failure. 


These hypotheses were tested using separate one way analyses of variance (by 
Attributor Condition) within each level of SES, for each attributional rating. When 
the derived F was significant, Scheffé’s test with P = 0-05 was employed to test for 
differences among group means. 


As could be predicted on the basis of the results reported above, the analyses 
for High SES pupils yielded only two significant effects; for attributions to task 
after failure, F (2,113) = 8:43, P « 0-0004 and for attributions to ability after 
failure, F (2,113) = 3-91, P « 0-02. In both cases Scheffé’s test revealed a 
significant difference between the means of actors as compared with observers, with 
pupils attributing their own failure more to task difficulty and less to lack of ability 
than they did that of others. These results thus disconfirm the hypothesis that there 
would be differences between the attributions for similar and different others, but 
provide partial support for the hypothesis that differences between attributions for 
the self and similar others would be minimised in the present design. 


The picture yielded by the analyses for Low SES pupils was quite different. 
There were significant effects of Attributor Condition for attributions to all factors. 
As hypothesised, the success outcomes of a different other were attributed less to 
luck, Е (2,113) = 13-00, P < 0-001 and to effort, Е (2,113) = 10:35, P « 0-001, and 
more to ability, Е (2,113) = 27-17, P < 0-001 than were those of the self or a similar 
other, a pattern consistent with high expectancy for success. Thus Low SES pupils 
attributed the failure of a High SES pupil much as did the High SES actors them- 
selves (see Table 1). The pattern of attribution after failure also followed that 
received from High SES actors, but differences between conditions were less clear. 
While luck was perceived as a less important determinant of different other than of 
own or similar other failure, Е (2,113) = 9-97, Р < 0:001, the failure of both the self 
and a different other was perceived as due less to task difficulty, F (2,113) = 5-05, 
P « 0-008 and to lack of ability F (2,113) = 17:71, P < 0-001 than was that of a 
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similar other. Effort attributions were higher for both different and similar other 
failure than for own failure F (2,113) = 5-34, P « 0-006. 


The results also revealed significant differences between self and similar other 
conditions for all attributions save those to ability after failure, and thus disconfirm 
the hypothesis of no differences between self and similar other conditions. The 
results indicated that similar other success was perceived as due more to the external 
factors of luck and easiness of task and less to effort than was own success. Failure 
attributions to the internal factors of effort and ability were higher in the similar 
other than the self condition, as were attributions to task difficulty. Luck, however, 
was perceived as more determining of own than of other failure. Thus in general 
Low SES pupils attributed more responsibility for success to themselves and more 
responsibility for failure to similar others. 


DISCUSSION 


The most striking findings of the present study pertained to differences in the 
attributional patterns and preferences of High and Low SES pupils. Discussion of 
the results will thus focus primarily on the implications of these patterns for the role 
of generalised expectancies in determining perceptions of the causes of outcomes. 
Pupils of High SES attributed their own successes primarily to ability and their 
failures to effort, a pattern which has been associated with high expectancy for 
future success (Fontaine, 1974). These findings are similar to those of other studies 
of college students from similar social backgrounds, and seem to confirm the 
hypothesis that the attributions of such pupils would be consistent with perceptions 
that success is a more expected outcome than failure. However, the additional 
finding that these pupils tended to attribute the outcomes of both similar and 
different others much as they did their own did not support the hypothesis that 
observer attributions would be affected by stereotyped perceptions as to what 
constitutes an expected outcome for pupils from different social groups. 


This lack of differentiation might seem to indicate that the manipulation was 
unsuccessful — i.e., that pupils continued to focus on and attribute their own 
outcomes in all conditions. This seems unlikely in view of the tendency of High SES 
actors to attribute failure in a more ego-enhancing way than did observers, and of 
the finding that the attributions of Low SES pupils differentiated consistently 
between self, similar other and different other conditions. In this case, the patterns 
of High SES pupils seem to reflect an internally consistent model for conceptualising 
the determinants of outcomes, not only for themselves but also for the world in 
general. Their expectancy of success seems reasonable if we can assume that most 
socially advantaged pupils at this grade level are generally presented with learning 
tasks within their capabilities, so that perseverance in the face of initial difficulty 
will usually be crowned with success. In addition, these perceptions are compatible 
with the tendency of teachers to encourage effort attributions, and to value hard 
work. Moreover, the influence of these teacher patterns seems to peak at the age of 
the pupils studied here (Bar-Tal and Guttmann, 1981). 


Thus the High SES pupils studied here seem to be in a state of 'attributional 
certainty,' having constructed a model of causality vis-à-vis the outcome of learning 
tasks which is largely confirmed both by their own experiences and by their 
environment. In view of the suggestion that attributional activity will increase under 
uncertainty (Kruglanski, 1978), the implication seems to be that they will not seek 
new clarifying information as to the relationships between causes and outcomes, and 
will not revise their perceptions under the impetus of isolated disconfirming 
experiences such as that provided by the present study. Here they continued to 
attribute outcomes fairly transparently dependent on task difficulty to ability or 
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effort, and this even in observer conditions where they could see that they and others 
had succeeded or failed on identical tasks. Moreover, since their model reflects their 
teachers’, they have not yet found it necessary to develop different theories for 
different kinds of pupils, and were thus unresponsive to the background 
information provided. Thus, if the results for this group failed to support the 
hypothesis that differences in generalised expectancies would be reflected in 
different patterns of attribution, this may well have been because their expectancies 
are more generalised than was thought — they have the same expectancies and 
model for everyone. 


This interpretation seems more inclusive and reasonable than one which might 
attribute the relatively weak tendency to an ego-enhancing bias in actor attributions 
in this study to utilisation of the additional task information provided by the study. 
In a somewhat similar design with college students Frieze and Weiner (1971) found 
that college students attributed the outcomes of the observed subject to task 
difficulty if these were the same as those reported for other students. However, the 
rather similar information provided here by the observers’ own outcomes did not 
serve to increase the salience of task difficulty in observer as compared with actor 
perceptions, even though pupils of this age should be able to utilise comparative 
information (Boggiano and Ruble, 1979). Rather, actors attributed failure more to 
task difficulty than did observers. This different pattern may well reflect develop- 
mental trends. If so, one can predict that these pupils’ perceptions will become more 
differentiated in time, as they themselves are exposed to a more varied range of 
academic subjects and outcomes, as they become more aware of the fact and 
implications of differences in the learning abilities of their classmates, and as they 
become less influenced by the values of their teachers. 


The results for pupils of Low SES were more compatible with the original 
hypothesis. Actors attributed their own success to a combination of internal and 
external factors — to effort and luck — and their failures to external causes — luck 
and task difficulty. It is possible that these patterns, too, reflect experiences in 
classrooms where disadvantaged pupils have to try harder than others to succeed, 
and where their outcomes may really depend largely on task difficulty and on luck 
— for example, whether the teacher happens to ask them easy or difficult questions. 
Moreover, attributions to luck and task difficulty do not explicate what a pupil can 
or should do to succeed in the future, beyond resorting to the defensive strategies 
described by Holt (1964). In addition, they run counter to the tendency of teachers 
to attribute failure to lack of effort. This tendency in turn runs counter to 
impressions that, at least in the early grades, most pupils do in fact try quite hard, 
but some learn that greater effort does not always promise success. Thus it is 
possible that disadvantaged pupils are in general less clear than advantaged ones 
about the determinants of their own outcomes, not because they perceive outcomes 
as random, but because the model transmitted by teachers seems to explain the 
outcomes of some pupils, but not of their own. 


If so, one would expect them to be more responsive than advantaged pupils to 
information which might help clarify the real determinants of outcomes. This 
implication is supported by the finding that Low SES observers attributed the 
outcomes of different (High SES) others much as did the High SES others 
themselves. Indeed, the order of means was identical — to ability, task difficulty, 
luck and effort after success and to effort, task difficulty, luck and ability after 
failure, and were thus consistent with a generalised expectancy of success rather than 
failure. The interpretation that this pattern was directly evoked by the information 
provided about the observed other is further supported by the quite different 
attributions received from Low SES observers of similar others. However, while it 
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was hypothesised that similar generalised expectancies for failure rather than success 
would yield similar patterns of attribution in actor and similar other conditions, 
there were differences also between these conditions. 


The tendency to credit similar others with even less responsibility for success, 
while blaming them more for failure, is compatible with findings, reported for an 
Israeli sample by Shuval (1969), that disadvantaged minorities tend to devaluate 
their own group. In the present context, such a pattern may serve to bolster self- 
esteem by enabling perceptions that there are others who are even less successful and 
in control of their outcomes than oneself. However, the attribution of similar other 
outcomes primarily to task difficulty and luck are also consistent with the previous 
suggestion that these factors may indeed by perceived by Low SES pupils as the 
major determinants of their outcomes in the classroom. In addition, the relatively 
strong attributions of similar other failure also to lack of ability may reflect most 
closely these pupils’ suspicions that, if their outcomes are so dependent on external 
factors, this must be due, at least in part, to their own shortcomings, since other, 
successful, pupils do not seem to be so affected by luck and task difficulty. This 
pattern may have emerged less clearly in actor attributions not only because of ego- 
defensive tendencies but also because it differs from teacher values and attributions. 
Thus the tendency of disadvantaged pupils to attribute outcomes to external, 
unstable factors does not seem to derive from beliefs about a world in which there is 
no necessary relation between behaviour and outcomes. Rather, they seem to have 
differentiated social perceptions whereby their own experiences are construed 
against those of others who are perceived as more successful and more in control of 
their outcomes. However, the discrepancies between their own and others’ 
outcomes, and between their own experiences and teacher values, leave them 
uncertain as to the determinants of their own outcomes. 


The further implication that these pupils are influenced by teacher attributions 
to effort suggest also that motivational interventions aimed at encouraging 
attributions of failure to effort (Dweck, 1975) may be redundant, or even damaging, 
for such pupils. A programme designed to encourage diagnostic and differential 
attributions, so that pupils can recognise when they have succeeded owing to ability 
or prior learning, when extra effort will yield success and when task difficulty is such 
that effort alone will not suffice, would seem more beneficial in reducing the 
attributional uncertainty uncovered by the present study and in encouraging more 
adaptive perceptions of their own outcomes. 


However, while the idea that pupils’ causal attributions may be importantly 
mediated by the extent to which their experiences in school facilitate perceptions that 
they understand, and therefore have some control over, the causes of their 
outcomes, seems to provide a parsimonious framework for understanding the rather 
complex findings of this study, some reservations should be noted. Restriction of 
attributional options to preordained causal factors, while facilitating comparison 
with other studies, necessarily prevents attribution to additional factors which may 
be perceived as even more central by the subjects themselves. Specifically, the 
present results suggest that in future subjects should also be given the option of 
responding that they do not know why they received a particular outcome. In 
addition, further studies are under way to examine the degree to which the 
attributions of successful and unsuccessful pupils for outcomes more closely related 
to actual school experiences can be importantly distinguished according to the 
degree to which they feel that they understand the determinants of their own and 
others’ results. 


Requests for reprints should be addressed to Dr. Ruth Butler, School of Education, The 
Hebrew University of Jerusalem, Israel 91905. 
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THE RELATIONSHIP BETWEEN EXPECTANCY AND 
ACADEMIC ACHIEVEMENT — HOW CAN IT BE 
EXPLAINED? 


By FRED VOLLMER 
(University of Bergen, Norway) 


Summary. The aim of the present study was to explain the relationship between 
expectancy and subsequent academic achievement. A modified version of effort 
calculation theory was used to generate hypotheses regarding determinants and effects of 
expectancy in the academic achievement situation. The hypothesis that past achievement, 
work spent in examination preparations, and perceived ability determine both expectancy 
and later examination performance, and thereby account for the relationship between the 
latter two variables, did not find support. With preparation work, past grades, and 
perceived ability controlled, expectancy still related to subsequent grades. The hypothesis 
that expectancy determines effort expenditure in the examination situation, and thereby 
grades, was also not supported. It was suggested that expectancy as an expression of self- 
confidence might be more strongly related to style of working in an exam situation than 
to an energy dimension like effort expenditure. 


INTRODUCTION 


Many empirical studies have found positive linear relationships between expectancy 
and subsequent academic achievement (e.g. Crandall, 1969; Morrison et al., 1973; 
Kovenklioglu and Greenhaus, 1978; Malloch and Michael, 1981; Kimball and Gray, 
1982; Vollmer, 1984). In the studies cited, 26 significant correlations are reported 
ranging from 0-28 to 0:80 with a median of 0-45. The main aim of the present paper 
is to discuss how this well-documented relationship can be understood. 


One possible explanation is that what people expect to achieve, and later on in 
fact achieve on academic examinations, has some common determinant (or 
determinants). It is reasonable to assume, e.g., that expectancy of future 
performance is based on previous achievement. Previous performance may also be 
related to actual later achievement (e.g., by way of some common ability or 
motivation factor). In six of the studies reported above (Keefer, 1969, Holen and 
Newhouse, 1976; Kovenklioglu and Greenhaus, 1978; Bernstein et a/. 1979; Malloch 
and Michael, 1981; and Vollmer, 1984) measures of previous academic achievement 
were controlled. But only in one case (Keefer, 1969) did the relationship between 
expectancy and subsequent grades then disappear. 


What other variables may determine both expectancy and subsequent academic 
achievement? Among theories of achievement motivation, effort calculation 
theories have most explicitly analysed the determinants of expectancy. According to 
Kukla (1972) and Meyer (1973), expectancy on achievement related tasks is 
determined by three variables: (1) task difficulty, (2) intended effort expenditure, 
and (3) perceived ability. 


The task difficulty variable, defined by Meyer (1973) as objective difficulty 
(specified by social norms), is not relevant in the present situation since for 
examinations people usually have to work on the same task(s), or tasks of equal 
objective difficulty. 


The second variable is more interesting. It is generally assumed that the more 
effort a person decides to exert on a future task, the higher his/her expectancy of 
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success will be. In some of the reported studies on expectancy and subsequent 
grades, expectancy was measured at the beginning of a work period (term or 
semester). In these cases expectancy estimates may have been based on intended 
future effort expenditure. According to effort calculation theory, intended effort 
expenditure determines actual effort expenditure and thereby also quality of 
performance. In these studies, then, intended effort expenditure may have been a 
common antecedent of expectancy and grades. 


In many of the studies of expectancy and academic achievement, however, 
expectancy was measured at the end of a long work period. It is, of course, possible 
that in such cases expectancy is determined by how much effort a person intends to 
expend in the future examination situation, and that this intention also determines 
subsequent effort output and performance quality. But a more plausible application 
of effort calculation theory in this kind of situation is that expectancy estimates are 
based on perceptions of past effort expenditure. It is also reasonable to assume that 
how hard a person has worked in preparing for an examination determines his/her 
level of knowledge and thereby performance quality. An important common 
antecedent of expectancy measured shortly before an examination and of 
subsequent grades may, then, be past effort expenditure. 


In a previous study (Vollmer, 1984} goal setting was assumed to be a 
determinant and an indicator of previous work (Locke ef al., 1981). Goal setting, in 
fact, proved to be strongly related to expectancy measured shortly before 
examination, supporting the hypothesis that work determines expectancy. The 
relationship between goal setting and actual grades, however, was rather weak. And 
when goal setting was controlled, expectancy still accounted for a substantial 
portion of variance in grades. This, in turn, indicates that work, although it does 
seem to be a common determinant of both expectancy and subsequent grades, 
cannot fully account for the relationship between the latter two variables. It can be 
objected, however, that goal setting is only a very indirect measure of work. And, 
before the hypothesis that previous work explains the expectancy performance 
relationship is rejected, it would seem wise to measure more directly how hard 
people have worked in preparing for an examination. 


Finally, effort calculation theory predicts that when task difficulty and effort 
expenditure variables are held constant, persons with high perceived ability will 
expect to perform better than persons with low perceived ability. Empirical support 
for the hypothesis that task-specific expectancy is related to a more general 
dimension of intellectual self-confidence (or perceived ability) has previously been 
provided by Meyer (1981) and Rustemeyer (1982) for experimental tasks, and 
Holahan and Kelly (1978) and Motowidlo (1981) for university examinations. 


Actual performance may also be determined by perceived ability if it is assumed 
that the latter variable reflects actual ability. Perceived ability may also have 
motivational consequences and thereby influence performance. 


An aim of the present study, then, is to find out if the relationship between 
expectancy measured shortly before an examination and subsequent achievement 
can be accounted for by the possible common antecedents: previous achievement, 
past effort expenditure, and perceived ability. 


While expectancy and subsequent achievement may have common antecedents, 
it is also possible that expectancy influences subsequent performance through some 
effect of its own. A fundamental assumption in all theories of achievement 
motivation is thus that expectancy is an independent determinant of motivational 
activation which, in turn, influences quality of performance. The central idea in 
effort calculation theory is that before performing an achievement related activity, a 
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person calculates expectancies for different possible levels of effort expenditure, 
taking into account task difficulty and perceived ability. The level of effort then 
actually chosen is assumed to be the one with the highest expected utility. 


According to Kukla (1972), people who expect to fail even if they work very 
hard, will not decide to expend a high level of effort; neither will those who expect to 
succeed by expending low effort. While Kukla’s expectancy dimension has only two 
values (probability of success = 1 or 0), Meyer (1973) has postulated similar 
relationships between continuous probability of success values and intended/actual 
effort expenditure. It is assumed that people who believe their chances of succeeding 
on a task are very low, even if they work as hard as possible, will neither intend nor 
expend a high level of effort. Those with very high probability of success estimates 
provided they work hard will also have low intended and expended effort. A person 
will choose to work hard when a high level of effort expenditure is associated with 
an intermediate probability of success, for, according to Meyer, it is under such 
conditions he/she can expect to obtain new information on her/his ability (cf. 
Meyer, 1973; Meyer and Hallermann, 1977). 


Effort calculation theory is formulated to fit tasks with two outcomes (success 
or failure), and experimental situations where the goal of performance can be 
obtaining new information on ability (Meyer), or succeeding by expending minimal 
effort (Kukla). With regard to such tasks and situations it is natural to conceptualise 
expectancy as probability of success, and also reasonable to assume that 
intermediate expectancy may be associated with a higher level of subsequent effort 
expenditure than both low and high expectancy. In real life achievement situations 
like university examinations, however, there are typically several possible outcomes 
(in some cases very many) that can be obtained, and the immediate goal of most 
students is simply to get as good a grade as possible. In such situations it is natural to 
conceptualise expectancy as expected grade rather than probability of success. It is, 
furthermore, hard to find reasons for assuming that expectancy of an intermediate 
grade should be associated with a higher level of motivation than expectancy of a 
high grade. In all the studies previously referred to on the relationship between 
academic achievement and expectancy, the latter variable was in fact defined as 
expected grade, and all the reported correlations between expectancy and grades 
were positive and linear. 


While these results, then, obviously do not fit effort calculation theory as 
presently formulated, they may still be understood as being in agreement with a 
general principle of the theory. According to that principle, motivation (intended 
and actual effort expenditure) on a future task is primarily determined by the 
person’s beliefs regarding what can be obtained by working hard. Applied to the 
academic achievement situation this principle may be specified in the following way. 
People will work hard if the expected reward is good; they will not work hard for 
nothing or for some mediocre result. Whereas low expectancy is discouraging, high 
expectancy is encouraging. According to this formulation, a simple linear 
relationship between expectancy measured shortly before examination and effort 
expenditure in the examination situation is to be anticipated. If, in turn, it is 
assumed that how hard people work in the examination situation determines, at least 
in part, how well they perform, then the hypothesis can be formulated that 
expectancy determines subsequent grades through effort expenditure. 


In a previous study (Vollmer, 1984), the effort expenditure hypothesis found 
only partial support. For, while effort expenditure in the examination situation did 
predict subsequent grades in both sex groups, only for men did expectancy relate to 
effort expenditure. And both for women and men, when previous achievement, goal 
setting, and effort expenditure in the examination situation were controlled, 
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expectancy still related to grades. These findings indicate, in turn, that expectancy is 
linked up with subsequent performance in some other way than just through effort 
expenditure. 


To sum up, the present study seeks answers to the following questions. With 
factors like previous grades, amount of work spent in examination preparations and 
perceived ability controlled, does expectancy still relate to subsequent grades? If 
such a relationship exists, to what extent does expectancy determine grade through 
effort expenditure in the examination situation? Is there evidence of some other 
connection between expectancy and performance than through the link of expended 
effort, or by way of common antecedents? 


METHOD 
Subjects were 107 female and 38 male students taking an undergraduate 
psychology examination at the University of Bergen. The uneven numbers of male 
and female participants in the present study represent population characteristics in 
that the examination was taken by a total of 141 women and 59 men. 


Variables and measuring procedure 
(1) Past achievement 

The most recent indicator of past performance common to all persons was 
grades on a university entrance examination (examen philosophicum). Grades on 
Norwegian university exams range from 1-0 (best possible) to 4-0 (including decimal 
values, e.g., 2:1, 2:2, etc.). While this grading system in principle comprises 31 
different outcomes, in practice values on the scale below 2-0 are seldom used. In 
connection with correlational analyses the scale was reversed so that high numbers 
mean good grades. 


(2) Self-confidence 

In order to measure general perceived ability in connection with academic 
achievement, a special Norwegian test was constructed, called the Self-confidence 
test. The instruction was: ‘‘Examination is a situation all of us know very well. 
Different people, however, can react to the situation in different ways. The aim of 
this questionnaire is to find out what reactions are typical for you. You will be 
presented pairs of descriptions of a number of possible reactions, thoughts, and 
feelings which may be typical for a person facing an exam. 


For instance: 


I usually suffer from I seldom have sleep 
insomnia in connection A B С D Е problems in connection 
with exams. with exams. 


Decide which of the two descriptions fits you best. If you usually suffer from 
insomnia, check off the letter A. If you feel that usually is too strong an expression, 
but that all the same you offen have sleep problems, the letter B can be used. If you 
seldom have sleep problems, check off E. If you feel that seldom is not quite 
appropriate but that you have a slight sleep problem, you can use the letter D. If you 
sometimes have problems, other times not, and one is no more typical than the 
other, mark the letter C in the middle of the scale."' 


The total scale consisted of the following seven items: 

(1) If I have worked hard in preparation, I am usually optimistic and expect to 
do well. 

(2) Before an exam 1 am usually well concentrated and ready to do my best. 

(3 Iam usually satisfied with my preparations for an exam. 
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(4) In connection with exams I usually have high self-confidence and trust 
that my intellectual abilities are good. 

(5) The last few days before an exam I try to relax and take it easy. 

(6) Usually I experience exams as exciting and stimulating. 

(7) Before an exam, I seldom wonder whether there are topics I don’t know 
enough about. 


Item 4, it can be seen, asks directly about how a person conceives of his/her 
intellectual ability in connection with examinations. The main principle underlying 
formulation of the other items, and derived from effort calculation theory, is the 
assumption that a given amount of preparation work will be associated with higher 
expectancy, self-confidence, and feeling of being well-prepared for those with high 
perceived ability than for those with low perceived ability. Items 1, 3 and 7 are based 
on this principle. Feeling confident and well prepared, in turn, can be assumed to be 
associated with positive attitudes toward the examination (items 2 and 6) and being 
able to relax and rest before the examination (item 5). 


(3) Work spent in preparing for the examination 

Students were asked to indicate how many semesters full-time work they had 
spent in preparing for the examination. А 7-point scale was used ranging from ‘‘'4 
semester" to ‘‘more than 3 semesters”. Students were also asked: “Нож many 
hours per week on the average do you spend studying? This question may be 
difficult to answer because how much one works may vary from day to day and 
from week to week. Try, however, to describe a typical week." As an overall 
estimate of how much work a person had invested in studying for the examination, 
number of hours per week was multiplied with number of semesters. 


(4) Expected grade 

Expectancy was evaluated by asking students one week before examination to 
*'estimate as realistically as possible the grade (1-0 — 4-0) you in fact think you will 
get on the exam". 1:0 is the best possible grade. The scale was reversed in 
connection with correlational analyses so that high numbers mean high expectancy. 


(5) Effort expended in the examination situation 

As a kind of persistence measure, number of words produced in writing 
examination papers was used. As discussed in an earlier study (Vollmer, 1984), word 
output is probably no pure measure of motivation, but may reflect knowledge as 
well. Length of paper can be ''purified'", however, by controlling the most 
important determinant of knowledge, namely amount of work spent in preparing 
for the examination. 


(6) Examination grades 

Grades on the psychology examination ranged from 1:0 (best possible) to 4-0. 
In connection with correlational analyses the scale was reversed so that a high 
number means good performance. 


RESULTS 


An item analysis of the self-confidence test was performed. Alpha coefficients 
of 0-75, 0-76 and 0:76 were obtained for women, men and total group respectively. 


Intercorrelations between the main variables: grades (V6), examination effort 
expenditure (V5), expectancy (V4), past work (V3), self-confidence (V2) and past 
achievement (V1), are shown for women and men in Table 1. 
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TABLE 1 
VARIABLE INTERCORRELATIONS FOR WOMEN AND MEN 

Women 

V2 V3 V4 V5 V6 
VI 0-24* 0-24* 0-24* 0-45** 0-46** 
v2 0-24* 0-46** 0-10 0-23* 
V3 0-36** 0-26* 0-25* 
V4 0-30** 0-45** 
V5 0-61** 
Men 

V2 y3 V4 V5 V6 
Vi 0-40* 0-00 0-06 0-42** 0-57** 
V2 — 0-06 0-29 0-11 0-45** 
v3 0:53** 0-34* 0-31 
V4 0-22 0-45** 
V5 0-54** 
Note: 
V1 = past grade. V4 — expectancy. 
V2 = self-confidence. V5 = exam effort. 
V3 = work. V6= grade. 
*P«0-05 ** P < 0-01, two-tailed test. 
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As it is theoretically meaningful to assume that expectancy may be related to 
motivation in a curvilinear way (cf. e.g., Atkinson, 1974a, and Meyer and 
Hallermann, 1977), and that motivation may relate to performance quality in a 
similar non-linear way (cf. e.g., Atkinson, 1974b), tests for curvilinear relationships 


between these variables were performed. No such relationships were found. 


The six variables described, and the hypothetical relationships between them (as 
derived from a modified version of effort calculation theory) were set up in a path 


diagram (Figure 1). 


FIGURE ! 


PATH DIAGRAM SHOWING POSSIBLE RELATIONSHIPS BETWEEN GRADES ON A UNIVERSITY EXAM (V6), 
EFFORT EXPENDITURE (V5), EXPECTANCY (V4), AMOUNT OF WORK EXPENDED IN EXAM PREPARATION 
(УЗ), SELF-CONFIDENCE (V2), AND PREVIOUS ACHIEVEMENT (V1) 







v3 









Prepar 


v5 


Effort 





Self- 
confid. 


vl 


Past 
Grade 













70 Expectancy and Academic Achievement 


Assumed common determinants (V1-V3) of expectancy and later grades are 
portrayed to the left of both expectancy and grades, and unidirectional arrows have 
been drawn from each of these determinants to both expectancy and grades. The 
possibilities that past work (as an indicator of knowledge), self-confidence and past 
achievement may determine length of paper, are shown by arrows from V1-V3 to 
V5. If the common antecedent hypothesis is correct, control of V1-V3 should lead to 
the result that expectancy no longer predicts grades. No assumptions were made as 
to directions of relationship between V1-V3. These variables are, consequently, 
linked by bi-directional curved arrows. The hypothesis that with factors like past 
grades, preparation work, and self-confidence controlled, expectancy will still have 
an effect on subsequent performance, and that this effect is primarily an indirect 
one, mediated by effort expenditure on the examination, is represented by 
unidirectional arrows from V4 to V5 and from V5 to V6. Finally, the possibility that 
expectancy has some other effect on performance quality than just by energising, is 
represented by a direct unidirectional arrow from expectancy to grades. 


Because of the relatively small number of male subjects (N =38), the initial 
data-analysis was not performed separately for the two sexes. Instead, appropriate 
equations for the full model depicted in Figure 1 were written and beta coefficients 
for the total group computed by multiple regression analysis. The model was then 
trimmed by deleting all paths with non-significant regression weights. Two tailed 
t-tests were used to test the significance of the beta weights. The goodness of fit of 
this trimmed model was then tested for men and women separately by use of the 
LISREL program (Joreskog and Sórbom, 1981). Regression weights for the full 
model for all subjects are presented in Figure 2. 


FIGURE 2 
FULL MODEL WITH REGRESSION COEFFICIENTS FOR TOTAL GROUP 
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With antecedent variables controlled, expectancy still related to grades. There 
was an indirect effect of expectancy on grades through effort expenditure as well as 
a direct relationship. The indirect effect, however, was small (0-176x 0:398 
= 0-07). In addition to expectancy and effort expenditure, past performance was 
also found to be an independent determinant of grades. Past grades also had a 
considerable indirect effect on subsequent grades through effort expenditure (0-17), 
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possibly indicating a motivational link between past and later performance. Effort 
expenditure on the exam was also related to past work. Finally, expectancy was 
independently determined by past work and self-confidence, but not past 
achievement. 


In the trimmed model, then, the following four paths were deleted: V1 to V4, 
V2 to V5 and V6, and V3 to V6. Path coefficients for the reduced model were 
estimated separately for women and men by the maximum likelihood method of 
LISREL and are shown in Figures 3 and 4. 


FIGURE 3 
TRIMMED MODEL FOR WOMEN 
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Average residuals were 0-031 for women and 0-036 for men. Corresponding 
chi-square values for goodness of fit were 2:8 (df=4, Р= 0-59) and 3:23 
(df =4, P=0-52). The trimmed model, in other words, fitted the data about equally 
well in both sex groups. According to the LISREL solution two beta coefficients in 
the trimmed model were non-significant in both sex groups. These were for effects 
of expectancy and preparation on effort expenditure. 


DISCUSSION 


The assumptions, derived from effort calculation theory, that expectancy 
measured shortly before an examination is determined by past effort expenditure 
and perceived ability, found clear support. Previous performance on the other hand 
was not found to be an independent predictor of expectancy. The latter finding may 
have to do with the nature of the past examination which was an entrance 
examination consisting of questions in logic, philosophy, and psychology. The 
subject matter of this past examination is somewhat different from the content of 
the subsequent one (undergraduate psychology) for which expectancy estimates were 
obtained. Results on the entrance examination may consequently not have been 
judged by students as relevant for what to expect on the later one. 


When previous grades, amount of work spent in examination preparations and 
perceived ability were controlled, expectancy still related to subsequent grades. The 
assumption that the relationship between expectancy and subsequent performance 
could be explained by common antecedents, i.e., did not find support. This finding 
accords with results of a previous Norwegian study (Vollmer, 1984). 


Expectancy did not determine grades primarily through effort expended in the 
examination situation. For although examination effort expenditure related to 
grades, expectancy was only weakly related to effort expenditure. There is evidence, 
i.e., of some other connection between expectancy and performance than through 
the link of examination effort expenditure. This also accords with previous results 
(Vollmer, 1984). In the earlier study, however, expectancy was а/ѕо found to affect 
grades by way of determining examination effort expenditure, but only in a male 
group. 


A problematic point in the present study is the use of length of paper as measure 
of effort expenditure and motivation. In experimental motivation studies effort 
expenditure (or persistence) is commonly measured by noting how many tasks the 
subject has attempted to solve. Often the tasks are quite simple, like adding figures. 
Output levels in such situations are probably not affected by individual differences 
in previous knowledge (though ability factors probably have a determining 
influence), and can consequently be taken as relatively pure measures of motivation. 


But, as discussed in a previous paper (Vollmer, 1984), in real-life achievement 
situations like academic examinations, ‘‘pure’’ measures of motivation are hard to 
come by. Thus output measures like number of words produced are probably 
influenced by knowledge level. It can be argued that a person cannot write a long 
paper unless he/she has a certain amount of knowledge. 


This does not mean, however, that length of paper has nothing to do with 
motivation. For while knowledge to some extent can be assumed to determine how 
much it is possible for a person to write, knowledge level can never alone determine 
how much the person acíually writes. One person may work hard, utilise his 
possibilities maximally, and produce lots of information concretely formulated in a 
long paper. Another person, knowing equally much, may not make maximal use of 
what he knows, but work half-heartedly, producing a short paper with little 
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information. Such differences, in turn, in how hard people work and try to do their 
best, concretely expressed in how much they write, are motivational differences. 


To some extent, then, differences in output level in the examination situation 
must be regarded as expressions of motivational differences, and not just as 
indicators of differences in knowledge. It should also be noted that by using the 
method of path analysis, the word output variable has been ‘‘purified’’ to a certain 
extent by controlling a probable indicator of knowledge level, namely preparation 
work. 


If the relationship between expectancy and subsequent performance cannot be 
explained by a set of common antecedents, nor by assuming that expectancy is a 
determinant of how hard people work in the examination situation and thereby of 
performance quality, the question remains: how can the relationship be explained? 
In a previous paper (Vollmer, 1984) it was speculated that if expectancy is an 
expression of self-confidence or perceived ability, perhaps expectancy may relate to 
how people work on intellectual tasks, and not just to a pure energy dimension like 
effort expenditure. ‘‘It seems reasonable to assume, e.g., that persons with high self- 
confidence may in connection with exams be much more willing to be independent, 
critical, problem-oriented, and suggest novel solutions than persons with low self- 
confidence. It is also probable that such approaches yield higher payoffs in terms of 
grades than more descriptive strategies" (p. 11). In the present study a relationship 
between expectancy and perceived ability was indeed found, making it the more 
relevant to investigate whether the link between expectancy and academic 
performance may be style of work. 


Requests for reprints should be addressed to Dr. F. Vollmer, Department of Personality 
Psychology, University of Bergen, Sydnesplass 9, N5000 Bergen, Norway. 
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THE COMPARATIVE EFFECTS OF STRATEGY 
AND EFFORT ATTRIBUTIONS 


By MARGARET M. CLIFFORD 
(University of Iowa, USA) 


Summary. The effects of attributing academic failure to effort or strategy were examined 
with the use of college students and teachers as subjects. Both the attitude and future 
performance of a college freshman described as receiving a GPA of 1-8 (C—) on her first 
semester's work were judged to be significantly more positive when poor performance 
was attributed to strategy in contrast to effort. A measure of perceived affect produced 
an interaction which indicated that college teachers, but not students, perceived affect to 
be more positive under effort attributions. 

A secondary purpose of this research was to examine the effects of having subjects 
judge perceived affect either before or after making judgments on attitude and future 
performance. Data suggest that the initial processing of affect may lead to more positive 
judgments of attitude and future performance on the part of teachers but less positive 
judgments on the part of students. The theoretical and practical implications of these 
findings are discussed. 


INTRODUCTION 


Ir has been well established within attribution theory that the stability dimension 
and its related internal attributions of ability and effort have far-reaching 
behavioural and affective consequences. Expectancies of future success are 
relatively low when failure is attributed to a stable cause such as lack of ability and 
relatively high when failure is attributed to an unstable cause such as lack of effort 
(McMahan, 1973; Fontaine, 1974; Weiner et al., 1976). Attributions associated with 
the internal stability dimension are also most often used in explaining academic 
outcomes (Simon and Feather, 1973; Bailey et al., 1975; Frieze, 1976; Arkin and 
Maruyama, 1979). In addition, it is this internal stability dimension that figures 
most prominently in attribution training programmes (Dweck, 1975; Chapin and 
Dyck, 1976; Sparta, 1978; Zoeller, 1979; Weiner, 1980) which have generally 
demonstrated the advantages of attributing failure to lack of effort rather than to 
lack of ability (Dweck, 1975; Andrews and Debus, 1978). 


There are, however, several reasons to examine further and delineate the 
internal stability dimension. More specifically, there are reasons to identify other 
internal attributions with which achievement outcomes can be explained. First, 
while like all other causal dimensions, internal stability is assumed to be a 
continuum (Weiner et al., 1971; Abramson et al., 1978), it is consistently 
operationally defined only as a dichotomy with effort and ability representing low 
and high stability, respectively. Such a discrepancy between theory and research 
seems unwarranted and unwise. Second, while effort (in contrast to ability) 
attributions for failure have been found to produce favourable performance results 
on the part of the actor, effort attributions for failure elicit unfavourable responses 
(e.g., lower grades) on the part of the observer or evaluator (Weiner and Kukla, 
1970; Rest et al., 1973). Third, while high effort on the part of students may help 
minimize teacher punishment in the event of failure (Weiner and Kukla, 1970; 
Covington and Omelich, 1979), it also poses a substantial threat to self-concept on 
the part of the performer. For failure combined with high effort elicits low ability 
explanations (Kun and Weiner, 1973; Covington and Omelich, 1979). Fourth, effort 
attributions for failure have been found to produce feelings of shame and/or guilt 
(Nicholls, 1976; Sohn, 1977; Brown and Weiner, 1984). Fifth, and most 
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importantly, both self-worth theory (Beery, 1975; Covington and Beery, 1976) and 
egotism (Frankel and Snyder, 1978; Snyder et a/., 1978) postulate that individuals 
may intentionally reduce effort so that they can then use effort attributions to 
explain failure, thereby avoiding attributions to low ability. Such reduced effort 
constitutes a major threat to academic learning. 


To address the question of what attributions other than effort and ability might 
be added to the internal-stability continuum, one might examine causal explanations 
that have been elicited from subjects with the use of open-ended formats. Such 
research indicates that in addition to ability and effort, attitude, mood, and physical 
condition are the internal attributions most frequently used to explain outcomes 
(Frieze, 1973, 1976; Bar-Tal and Darom, 1979; Cooper and Burger, 1980). 
However, neither mood nor physical condition is likely to become a target variable 
in attribution training programmes; nor are these attributions likely to optimize 
affect, attitude, and/or performance. As for attributions to attitude, they are 
complex functions of several other variables, and thus are likely to have limited 
value as predictors or intervention variables in either a theoretical or practical 
model. There are, however, at least two additional variables that might represent 
intermediate values on the internal-stability continuum; namely, skill and strategy. 
The resulting four internal attributions and their definitions would be: ability defined 
as potential or aptitude, skill defined as ability that has been fully developed, strategy 
defined as subskills or methods and techniques used to develop skills, and effort defined 
as attentiveness to task demands (including time spent on task). 


While a case can be made for delineating the internal stability continuum with 
the addition of both skill and strategy, the present discussion and research is focused 
on the addition of strategy and more specifically on the comparison of strategy and 
effort attributions. The placement of strategy between effort and ability on the 
internal stability dimension implies that strategy is less stable than ability and more 
stable than effort. The validity of this assumption is debatable. On the one hand, it 
can be argued that both effort and strategy can be changed instantly and at any time 
— implying that these attributions might be equally unstable. On the other hand, it 
can be argued that attributions to inappropriate strategy often imply the need to 
either develop a new strategy or to experiment selectively with other available 
strategies. In this case, strategy appears somewhat more stable than effort. 


Regardless of the relative stability of strategy and effort, there is good reason to 
compare the effects of these attributions. The potential advantages of strategy 
attributions appear to be many: first, strategy attributions might allow one to reduce 
or escape the guilt associated with not trying or attending. Second, they may enable 
one to escape the embarrassment and public shame associated with being stupid. 
Third, and perhaps most importantly, strategy attributions might turn failure 
outcomes into problem-solving situations in which the search for a more effective 
strategy becomes the major focus of attention. This search and exploration for 
improved strategy can be expected to elicit increased effort without the fear that 
subsequent failure will automatically and immediately imply low ability. 


Despite these apparent advantages, and for all the studies in which attributions 
have been freely elicited from subjects, strategy as such has never emerged as an 
important attributional factor (Weiner et a/., 1971; Frieze, 1976; Bar-Tal and 
Darom, 1979; Cooper and Burger, 1980). It should be noted, too, that most subjects 
participating in attribution identification studies have been either college students or 
teachers and that most failure situations experimentally presented as a means of 
eliciting attributions are educational in nature and setting. 


The absence of strategy attributions in the educational world is probably no 
accident. Too few teachers and students are knowledgeable about learning and 
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instructional strategies or psychological principles of rehearsal, coding, imagery, 
practice, meaningfulness, sequencing, etc. Thus, this low incidence of strategy 
attributions may primarily signal a weakness in our teacher-preparation curricula. 
However, the fact that academic learning tends to be perceived as a simple by- 
product of ability and effort should not keep us from examining the potential value 
of strategy attributions. There is increasing evidence that learning can be favourably 
affected by the teaching of learning strategies (Brown and Campione, 1978; Brown, 
1981; Cook and Mayer, 1983). Given this finding, it is not unreasonable to predict 
that strategy attributions may also enhance motivation and learning. 


As a motivational variable, and more specifically as an attributional variable, 
relatively little is known about strategy. Anderson and Jennings (1980) referred to 
it as the ‘‘overlooked factor", and conducted an experimental study which 
demonstrated that subjects who attributed failure at a persuasion task to ineffective 
strategies subsequently made more changes in their persuasive strategies and made 
more favourable predictions about their future success at the persuasion task than 
did subjects who attributed failure to lack of ability. Furthermore, they reported 
that, "strategy subjects made [future success] estimates as high as those of no 
attribution subjects who had experienced success in their initial persuasion attempt" 
(p. 403). However, it should be noted that this study did not provide for a 
comparison of strategy and effort attributions. 


In a related study, Anderson (1983) had subjects preselected for either their 
tendency to attribute outcomes to changeable or to unchangeable causes engage in a 
blood drive phone-solicitation task. These subjects were assigned to one of three 
experimental conditions: no attribution, ability-trait attribution, or strategy-effort 
attribution. In support of the researchers' predictions, subjects who made strategy- 
effort attributions, whether by experimental manipulation or by preselection, were 
found to expect more success and more improvement with practice. These strategy- 
effort subjects also showed higher motivation and task success than did ability-trait 
oriented subjects. 


It is important to note, however, that strategy and effort were combined and 
thus confounded in this study. The strategy-effort attribution manipulation 
included the following comment, ‘‘That is, some people may do well because they 
try very hard to come up with the right tactic or approach to persuade the people 
they are calling. People who fail may do so mainly because they do not try and do 
not try to come up with effective strategies" (p. 1140). 


Because of the absence of effort attributions in the Anderson and Jennings 
(1980) study and the combining of effort and strategy attributions in the Anderson 
(1983) study, the relative merits of effort and strategy attributions cannot be judged. 
This pair of studies simply indicates that strategy attributions for failure elicit more 
constructive responses to failure than do ability attributions. 


Thus, the following pair of research studies was designed primarily to compare 
these two attributions and their effects on affects, attitude, and future performance 
expectations. A secondary concern for the present pair of studies was the ordering of 
the dependent variables. Pilot data had suggested that the assessment of affect 
immediately following failure and prior to the assessment of attitude and future 
performance, might result in less favourable attributional effects in general. Thus, 
two attributional conditions (effort and strategy) and two placements of the affect 
measure (first and last) were independently manipulated to test the prediction that 
strategy attributions produce more constructive effects than do effort attributions, 
and to examine the question, does the placement of affective judgments affect the 
nature of those judgments and the nature of other judgments related to a failure 
event? 
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METHOD 


Subjects 

For Study 1, one male and one female liberal arts, college teacher was randomly 
selected from each of 42 college catalogues representing a midwestern state. These 
catalogues were obtained from a university library. Of the 84 subjects selected for 
this study, 54 returned completed research materials; this represented a return rate 
of 64 per cent. Exactly half of the respondents were male. 


For Study 2, 50 female students enrolled in an introductory educational 
psychology course served as subjects and were given course credit for their 
participation. Data obtained from male students were not analysed because there 
were too few male subjects. 


Materials and procedures 

The materials used in both studies were identical and consisted of a situational 
description and a set of response items used to assess perceived affect, attitudes, and 
future performance of the student portrayed in the situational description. The 
situational description began with the following statement: 


Lynn Carstin obtained a 1-8, or C— grade point average for her first semester’s 
course work in college. Before registering for the second semester, each student 
was required to discuss his or her previous semester’s performance with an 
academic advisor. Lynn and her advisor spent an hour discussing Lynn’s first 
semester’s grades, her study habits, intellectual ability, her effort, her learning 
strategies, social and recreational interests, reasons for going to college and 
many other factors that might have affected Lynn’s first-semester performance. 
After a frank and candid discussion they completed the following Student 
Characteristic Rating form. 


The student characteristic rating form, supposedly completed by Lynn and her 
advisor, consisted of the category headings, ability, effort, and strategy with 
subheadings and mean rating statements as follows: 


ABILITY 
High School Rank in Class 
High School Grades 
College Entrance Exams 
1.О. as measured in high school 
Standardised achievement test profile for high school 
Overall ABILITY Rating (sum of scores x 5)= 


EFFORT 
Time spent studying 
Class attendance record 
Meeting assignment deadlines 
Attentiveness during lectures 
Completeness and neatness of class notes 
Overall EFFORT Rating (sum of scores x 5)= 


STRATEGY 
Setting course and test goals 
Recording assignments and grades 
Building a course glossary 
Rephrasing lectures and chapters in one’s own words 
Reading and outlining chapters before and after lectures 
Overall STRATEGY Rating (Sum of scores х 5) 
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To the right of each subheading was a numbered rating scale (1 = Very Poor, 
2=Poor, 3=Fair, 4=Good, 5=Very Good, 6=Excellent), with one rating per 
subheading clearly circled and indicating the judgments of Lynn and her advisor. 
The circled ratings and average category ratings were used to manipulate attribution 
for failure. Two versions of this form were prepared: the effort attribution version 
reflected low effort (mean rating = 2: 2) and high strategy (mean rating = 5:2), while 
the strategy attribution version reflected low strategy (mean rating — 2:2) and high 
effort (mean rating— 5-2). Ability (mean = 3:0) was held constant across the two 
versions. 


Following the situational description and Student Characteristics Rating form 
were eleven 7-point response continua. Four of these continua anchored with the 
terms, pride—shame, encouragement—discouragement, hope—despair, and 
happiness—sadness, were headed with the caption ''Lynn's feelings about college", 
and served as the basis for the dependent measure of affect. Four other continua, 
anchored with the words, determined—indifferent, favourable—unfavourable, 
good— bad, and responsible—irresponsible, were headed with the caption ‘‘Lynn’s 
attitude toward college", and served as the basis for the dependent measure of 
attitude. Three continua anchored with the phrases, improved GPA—no improved 
GPA, improved strategies—no strategy improvement, and increased effort—no 
effort increase, were headed with the caption ‘‘Lynn’s future performance in 
college"', and served as the basis for the dependent measure of future performance. 
Responses were scored such that the lower the value, the more favourable the 
response. To subjects in Study 1, the situational description and response items were 
mailed with a self-addressed stamped envelope and a cover letter soliciting subject 
participation. To subjects in Study 2, these materials were distributed during a 
research activity period in a classroom setting. 


RESULTS 


Data for each of the three dependent variables obtained from subjects in Study 
1 were initially analysed using a 2x2x2 (Sex x Attribution x Feeling-Placement) 
ANOVA program which allowed for unequal Ns. Because there were no interactions 
involving the Sex factor and no Sex main effects for any of the three dependent 
measures, these data were combined with data from Study 2. Each of the three 
dependent variables for the combined data set was then analysed as a 2х2х2 
(Sample x Attribution x Feeling-Placement) design using an ANOVA program 
which allowed for unequal Ns. All simple effect tests reported below are based on 
the Tukey-Kramer modification (Kirk, 1982) and the 0-05 level of significance. A 
minimum value of q=2-83 (2, 60), can be assumed for significant follow-up tests. 
The means for the dependent variables are presented in Table 1. 

For affect there was a significant Attribution x Sample interaction, F — 5-23 (1, 
95), P < 0-05. Means and follow-up tests indicated that students judged affect to be 
slightly more positive or desirable when failure was attributed to strategy (M = 4:03) 
in contrast to effort (М = 4:40), whereas college teachers judged affect to be 
significantly more positive when failure was attributed to effort (M=4-06) in 
contrast to strategy (M = 4-72). Furthermore, under the strategy condition, students 
judged affect to be significantly more positive than did college teachers. No other 
main effects or interactions were significant for the measure of affect. Attitude 
yielded a significant Samplex Feeling-Placement interaction, F—4-20 (1, 95), 
P < 0:05. Means and follow-up tests indicated that students judged attitude to be 
significantly more positive when feelings were assessed last (M — 3-54) rather than 
first (M = 4:22); a non-significant reversal of this pattern was shown for college 
teachers. The only other significant effect for attitude was the attribution main 
effect, F=25-09 (1, 95), P < 0-001; attitude was judged to be more positive when 
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TABLE 1 
MEANS FOR DEPENDENT VARIABLES FOR COMBINED DATA SET 














Variables 
Condition (N) Affect Attitude Performance Future 
College Students 
Effort Attribution 
Feelings First (13) 4-50 4-71 2-87 
Feelings Last (11) 4:27 4:02 2°82 
Strategy Attributions 
Feelings First (12) 4-23 3.69 2°25 
Feelings Last (13) 3-85 3.13 2°23 
College Teachers 
Effort Attributions 
Feelings First (15) 3.77 4-33 2-93 
Feelings Last (16) 4:34 4-73 3-79 
Strategy Attributions 
Feelings First (10) 4°63 3-20 2:77 
Feelings Last (13) 4:79 3-37 3-03 





Low values represent positive or desirable ratings. 


failure was attributed to strategy than when it was attributed to effort. For the 
future performance measure there was an Attribution main effect F=6-88 (1, 95), 
Р < 0-01 and a Sample main effect F=7-73 (1, 95), P < 0-01. Strategy produced 
future performance expectations that were significantly more positive than those 
produced in the effort attribution condition, and students expressed future 
performance expectations that were significantly more positive than those expressed 
by teachers. No other effects for future performance were significant. 


DISCUSSION 


These data suggest that a reliable distinction between effort and strategy 
attributions can be made. In addition, these data support the prediction that strategy 
attributions for academic failure produce more positive and constructive judgments 
than do effort attributions. Both students and teachers perceive a student’s attitude 
and future performance to be enhanced by attributing failure to ineffective 
strategies rather than to lack of effort. This would seem to have major implications 
not only for educators, but also for counsellors, parents, and others in the business 
of trying to motivate individuals who have experienced failure or disappointing 
levels of performance. But given that strategy does not appear to be a frequently 
used attribution (as evidenced by its absence in free-response attribution studies) 
there may be a need to increase people’s awareness of this attribution. There may 
also be a need to actually identify strategies relevant to the behaviour of concern 
(e.g. academic achievement, financial management, socialising). 


The significant Attribution Sample interaction for affect (i.e., students 
judged affect to be slightly more positive under strategy attributions while teachers 
judged affect to be significantly more positive under effort attributions) is difficult 
to explain. It is conceivable that strategy is perceived to be more stable and/or more 
closely linked to ability by teachers than by students. But if such a perceptual 
difference were used to explain this interaction, a similar interaction would be 
expected for future performance, and that was not observed. It is worth noting, 
however, that McNabb (1983) in a recent study also found a trend which indicated 
teachers judge student affect but not future performance to be more positive under 
effort in contrast to strategy attributions. The controversies on the affective 
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consequences of effort and ability (Brown and Weiner, 1984; Covington and 
Omelich, 1984; Weiner and Brown, 1984) may be further complicated by the 
addition of strategy attributions. In any event, it would seem that affect has been 
and will continue to be more difficult to predict from attributions than are future 
performance expectations or attitude. 


The significant Sample x Feeling-Placement interaction for attitude suggests 
that the processing of feelings or affect before the processing of attitudes and future 
expectations leads to relatively less positive ratings on the part of students and 
relatively more positive ratings on the part of teachers. This interaction would seem 
to warrant further study, especially in view of the fact that trends consistent with 
this interaction are present for both of the other two dependent variables. It is 
possible that teachers view an initial acknowledgment of shame, and/or regret on 
the part of a failing student as a prerequisite for, or means of enhancing the 
student's motivation to better his or her subsequent performance. An initial 
emphasis on feelings is consistent with Rogerian counselling theory (Rogers and 
Wood, 1974; Rogers, 1977) which stresses the importance of discussing and ‘‘getting 
in touch with one's feelings". 


It is possible, however, that focusing initially on future expectations and 
attitudinal variables such as determination and perceived responsibility may actually 
lead to more constructive motivation and behaviour change on the part of an 
individual who has experienced an undesirable outcome. Such a focus of attention 
might also mitigate the negative affect associated with the outcome itself. This 
postulation is consistent with the rational-emotive therapy counselling techniques 
recommended by Ellis (1979a, 1979b). This Sample x Feeling-Placement interaction 
suggests that educators may favour a performance-analysis procedure that is 
relatively unconstructive for students. 


Another important issue raised by the present data regards the nature of the 
stability dimension. Based on earlier studies which compared the internal 
attributions of ability and effort, it was postulated that as stability decreases the 
probability of performance improvement increases (Dweck and Reppucci, 1973; 
Weiner, 1974, 1979; Dweck, 1975; Weiner and Sierad, 1975; Abramson ef al., 1978; 
Meyer, 1980). The implied linear function between stability and behaviour change 
would appear to be challenged by the data obtained from the present study. If one 
accepts the contention that strategy is less stable than ability and more stable than 
effort, and if one accepts the findings that indicate both effort and strategy 
attributions generate more favourable responses than do ability attributions 
(Anderson and Jennings, 1980; Anderson, 1983), a curvilinear in contrast to a linear 
function would appear to desciibe better the relationship between stability and 
future performance as well as the relationship between stability and attitude. 


The notion that an intermediate level of stability (typified by strategy 
attributions) may produce optimum effects is not incompatible with achievement 
motivation theory which postulates that tasks of intermediate difficulty are 
relatively appealing (Atkinson, 1964; Atkinson and Birch, 1970); or with attribution 
theory, which postulates that moderately difficult tasks offer maximum information 
about one's skill and are, therefore, preferred over more difficult or easier tasks 
(Trope, 1975; Trope and Brickman, 1975; Meyer et al., 1976); or with intrinsic 
motivation theories which postulate that moderate challenge is a prerequisite for 
intrinsically motivating tasks (Csikszentmihalyi, 1975). 


On the other hand, to conclude or even imply that behaviour in general is a 
curvilinear function of the stability dimension would be premature and unwise. 
McNabb (1983) using ability, strategy, and effort to represent high, medium, and 
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low stability, respectively, found support for a linear relationship between the 
stability of the attribution and teacher support for a failing junior-high school 
student. More specifically, she found that a composite measure of teachers’ 
sympathy, grading, and willingness to help students decreased as the attribution for 
failure went from ability to strategy to effort. Thus, there may be measures which 
are a linear function of the stability dimension and others that are a curvilinear 
function of this dimension. In any event, the need to redefine not only the stability 
dimension but all the attributional dimensions in terms of continua with three or 
more operationally defined values before judging the functions of these dimensions 
seems apparent. 
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THE USE OF THE BRITISH ABILITY SCALES 
WITH A GROUP OF AUSTRALIAN SCHOOLCHILDREN — 
SOME PRELIMINARY DATA 


By JAMES WARD anp LYNNE OUTHRED 
(Macquarie University, NSW, Australia) 


SUMMARY Ina preliminary study, the four core tests of the British Ability Scales were administered 
to a sample of 208 Australian schoolchildren aged 9-6 to 10-5 years. In general the results obtained 
were comparable with the UK standardisation group although there was a tendency towards higher 
scores by the Australian sample. Analysis of the scale characteristics of the items were carried out 
using the Rasch method and a normal ogive model (NOHARM). The Rasch scaling appears to be 
robust with this population. The NOHARM analysis produced strikingly similar results. Sample 
estimates of reliability and validity were obtained and were found to be acceptable. 


INTRODUCTION 


Although the British Ability Scales (Elliott ег al., 1982) are now well accepted in the 
United Kingdom and are attracting increasing attention elsewhere, thus far there appears to be 
no published evidence of their use overseas. This is somewhat surprising as the Scales clearly 
have potential for detailed clinical enquiry and possess some psychometric refinement in the 
use of Rasch's method for scaling. 


This report therefore, describes a recent try-out of four tests of the scale, the project 
having the following objectives: 

(1) to obtain representative data on the use of the test, 

(2) to gather observations concerning such issues as ease of administration and 
potential for clinical enquiry, 

(3) to investigate the replicability of the Rasch scaling method by examining the item 
difficulty parameters obtained from the original norming sample with those 
obtained for the Australian sample, 

(4) to compare the results of item selection using a maximum likelihood estimation 
procedure (based on the Rasch model) with a least squares solution (based on a 
normal ogive model). 


METHOD 


Subjects consisted of 108 girls and 100 boys in the age range 9 years 6 months to 10 years 5 
months, drawn from 15 Sydney schools, selected so as to form a reasonable representative 
sampling basis. 


Administrative procedure 

The test was administered by six psychologists (enrolled in the degree of Master of Arts 
(School Counselling) at Macquarie University), each having received a training course in the 
administration of the test. Children were tested on four scales: Matrices, Similarities, Recall of 
Digits and Speed of Information Processing. Data from this set of scales form the minimum 
necessary to obtain an IQ estimate for children aged 8 years or more. 


Methods of analysis 
The children's responses on these four core tests were analysed using two computer 
programs: 

(1) NOHARM which is based on R. P. McDonald's (1980) approximation to the 
normal ogive curve by a function of orthogonal polynomials, assuming that the 
latent trait has a normal distribution. 

(2) MLTBIN3 which follows a procedure used by Wright (Andritch, 1980) to estimate 
parameters in the Rasch model. In this method the parameter values are obtained 
by unconditional maximum likelihood estimation, with person and item para- 
meters being estimated jointly. The program assumes a common discrimination 
parameter for each item. 
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RESULTS 

(1) Comparison of Austrahan and British data 

The distributions of the British sample for each range and score group are not given in the 
manual. However, abilities are tabulated at each age level for each centile and also for each 
raw score group. Thus it was possible to obtain the approximate percentage of the British 
sample falling into each score group category (for a given age range). This could then be 
compared with the percentage of the Australian sample in each category. The results are 
shown in Figure 1. The distributions of the Australian and British samples are similar for each 
scale, with the most marked discrepancy in the Speed of Information Processing scale. Even 
so, the majority of the Australian sample seem to obtain higher scores than the British sample. 


(2) Comments upon administration 
G) Matrices. It was reported that the required figures were easy to draw, even though they 
involved quite difficult problems. No problem was reported in interpreting the figures. 


(ii) Similarities. These items were found to be easy to administer and score. They also helped 
to obscure failure so that, in general, the children liked this test. However, the single practice 
example was considered insufficient to introduce the scale for some children. It was also 
noticed that two items had specific British terms: one used ''pilchard" and the other 
demanded ‘‘footwear’’ as a category name, whereas the common Australian term is ‘‘shoes’’. 


(iii) Recall of digits. It was noticeable that several items had repetitious sequencing of some 
digits, making it likely that a child would pass that item, only to be faced by a further series of 
consecutive failures before reaching the discontinuation criterion. The comment was made 
that this made the task of maintaining rapport more difficult. 


(iv) Speed of information processing. This scale could be anxiety producing as each page was 
corrected and any errors indicated to the child before going on to the next page. It was 
strongly recommended therefore that a criterion for discontinuation should be incorporated 
with this scale. These results should be treated with caution since they depend on the child's 
number concepts, skills with paper and pencil, visual acuity and hand-eye co-ordination as 
well as processing speed. 


(3) Item difficulty estimates 

The Rasch model is a restricted form of the logistic model and contains two independent 
parameters: person ability and item difficulty. The usual method of fitting the Rasch model 
assumes a common discrimination parameter. The BAS manual gives scaled difficulty 
estimates for the items in each scale. These are scaled as 10 (b + (1 — 0)) where b is the difficulty 
estimate (in logits) for item g and 0 is the ability estimate (in logits) for a raw score of | on the 
scale. These were correlated with the difficulty estimates from MLTBIN3 (scaled as 4-35b + 
50). The correlation coefficients were 0:97 (Matrices), 0:96 (Similarities), 0:99 (Recall of 
Digits) and 0-89 (Speed of Information Processing). These results seem to indicate that the 
British scaling procedures are appropriate for the Australian sample. A somewhat smaller 
correlation coefficient for the Speed of Information Processing scale is influenced by two 
items, 11 and 15 (Test C). If these are omitted the correlation coefficient is 0:96. 


(4) Comparison of the results from NOHARM and MLTBIN3 

Theoretically there should be very little difference between the parameter estimates from- 
the two programs as the logistic and normal ogive models are very closely related. However, 
the estimates are obtained by quite different mathematical techniques. Choppin (1980) has 
argued in favour of the relative mathematical simplicity of the Rasch model, asserting that 
**other more complex models require samples running into the thousands in order to obtain a 
similar degree of reliability in parameter estimation". This claim was not substantiated by the 
results for this sample of 208 children, as the correlation between the item estimates exceeded 
0-99 for each of the four scales: indeed the matrix of residuals obtained from the NOHARM 
program were all small (compared to 1/N), indicating that the model fitted the data extremely 
well. 


(5) Reliability and validity 
Tentative estimates of reliability (KR20) were obtained and proved satisfactory for three 
scales (KR20 = 0:85 or above). The Similarities scale which, as has been mentioned above, 





sieaK g'01 - G'G чеүртаіузпу "g-OL - OL USTITIG “забие әбу 
amo1p 91008 


02 si ot s ° 





w 
иеттегузпу 


а 
usypia 





Otdwes jo eequaolag 





8015532058 NOILVARIOJNI 4D 03305 


sreoÁK спі - 0°В URTIEIISMY 'S'OL - 01 USTITIG 'soóuei әбү 
dnoig a105$ 
sz oz st 0t E 0 
0 


Research Notes 


* 
четтетузту 


ч sl 


511810 30 TWOSH 


86 








$1004 5°01 - Q'6 URFTRIISNY ^5°01 - 01 USPE ‘sauer әбу 
dnoig алоос 


02 Si 01 S 


v 
uetrerysny 


ysy} pa a[dwes jo әбеђиәоіза 


S3ILIU UIS 


з1еәќ G 01 - 0°8 Ueprti3sny “S"OL - OL uspipig "забив: әбу 
dnoip asio0S 


* 
UeT[u13s5ny 
а 
чэт атбшез jo обезиџоогод 


"SVG AHL JO SISAL 3830) NüOd NO S3IdAVS NYITVHISITV ПМУ НИЦ NHSAISE NOSIVdWOD 


T ANDIA 


Research Notes 87 


had difficulties over item ordering showed somewhat less internal consistency (KR20 = 0-70); 
based upon a sample of 50 taken randomly from the total group, promising estimates of 
congruent validity were obtained. The BAS correlated well with the WISC-R (e = 0-83), and 
with a group test of general ability. 


DISCUSSION 


The results of this limited try-out indicate, therefore, that the BAS is extremely promising 
for use with Australian children. Although there are minor difficulties with the item order ina 
core test, the scaling procedures appear to be robust and, in general, the test was easy to 
administer. However, there is a need for more extensive data regarding the performance of the 
whole scale and for further estimates of reliability and validity. For this purpose the results 
from a large scale try-out will be combined with those reported above and reported in due 
course. Further interest centres around the use of the scale for experimental and clinical 
purposes. Preliminary work suggests that it is appropriate for a study of strategy preference in 
learning-disabled students and will therefore be incorporated into a study of such children. 
Cumulative data from the use of the test are also becoming available from its routine 
administration in psycho-educational clinics. These data will provide further information as to 
its ability to produce useful diagnostic profiles and its predictive validity. 
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EXTRAVERSION AND SEX DIFFERENCES IN READING PERFORMANCE 
IN EIGHT-YEAR-OLD CHILDREN 


By R. J. RIDING ann J. COWLEY 
(Department of Educational Psychology, University of Birmingham) 


SUMMARY 48 children aged 7-8 years were given the Jumor Eysenck Personality Inventory and three 
reading tests which differed in the amount of context provided and the task required: (a) the Burt 
Graded Reading Test (words without context), (b) the NFER Primary Reading Test 2 (PRT2) 
(sentence completion), and (c) material taken from the Neale Analysis of Reading Ability (passage 
comprehension) of two levels of difficulty. Similar significant interactions were found between sex 
and extraversion in their effect on performance for all three types of test such that for boys, 
extraverts did best, while for girls introverts were superior. Further, significant interactions were 
observed for the passage comprehension between passage difficulty and mode of reading 
(aloud/silently), 1n that reading aloud was superior for high difficulty and reading silently for low, 
and fen passage difficulty and extraversion, in that ambiverts performed least well with the difficult 
material. 


INTRODUCTION 


Studies of individual differences in reading performance have frequently concentrated on 
one variable at a time. The intention of the present study was to consider a possible interaction 
between personality and sex in their effect on children's reading of a range of materials. 
Research into personality and reading performance and into sex differences in attainment will 
be considered ın turn. 


Personality and reading performance 

The personality dimension of introversion-extraversion has been found to relate to 
reading performance with extraverts superior to introverts. Eysenck and Cookson (1969) gave 
the Schonell Graded Reading Test and the Junior Eysenck Personality Inventory (JEPI) to 
11-year-old children. Reading test scores of the extraverts were markedly superior to those of 
the introverts. Elliott (1972), using a sample of lower junior school children, found significant 
correlations between reading achievement on the Schonell Graded Word Reading Test and 
extraversion score on the JEPI. Savage and Savage (1973), using the Watts Reading Test 
(sentence completion) and the JEPI in a study of children between the ages of 7 and 10, found 
that extraversion was positively related to reading attainment. 


Sex differences in reading performance 

Several studies have found evidence of a developmental trend of sex differences in 
reading attainment. A larger proportion of boys than girls make a slow beginning at learning 
to read, but by 10 years of age population differences between boys and girls are no longer 
apparent. 


Group tests (silent-reading). Mean attainments in both comprehension and reading 
vocabulary were lower for boys than girls in results obtained by Carroll (1948) for 6- to 7-year- 
olds, by Dykstra and Tinney (1969) for 6- to 8-year-olds, by Elley and Reid (1969) for 8-year- 
olds and by Gates (1961) for 8- to 9-year-olds. Reading comprehension only was examined by 
Hughes (1953) for 8- to 9-year-olds and by Preston (1962) and Rutter et ai. (1970) for 9-year- 
olds, and the mean attainments of boys were lower than those of girls. In word recognition, 
mean attainments were reported lower for boys than girls by Southgate (1959) for 6- to 7-year- 
olds, and by Kellmer Pringle, et al., (1966) for a national sample of 7-year-olds. For children 
ре 10 апа 15 years of age sex differences in mean reading attainments tend not to be 
Significant. 


Individual tests (reading-aloud). Differences between boys and girls were reported to be 
not significant for individually administered tests of oral reading by Vernon (1938) for 6- to 
9-year-olds and by Elliott (1972) for 8-year-olds. This finding is not confined to oral reading 
of unconnected words. Interestingly, Neale (1966) reported that no consistent sex differences 
were found for 6- to 9-year-olds in either accuracy or comprehension of oral readings of 
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continuous text. So whether a difference in reading attainment favouring girls occurs seems to 
depend on the reading test used. 


Interactions between sex and extraversion 

Interactions have been found between sex and extraversion in their effect on scores on a 
number of different tasks: Riding and Dyer (1980) for recall of prose passages, Riding and 
Armstrong (1982) for performance on mathematics tests, Riding and Boardman (1983) for 
map reading performance, and Riding and Egelstaff (1983) for detection of change in prose 
passages. In the present study, therefore, we should expect an interaction between sex and 
extraversion in their effect on reading attainment. 


The purpose of the present study was to investigate the extent of the relationship between 
reading performance and extraversion by (1) considering performance on a variety of reading 
tests that ranged from no context (reading single isolated words) to substantial context 
(reading a paragraph) and which differed in presentation (individual or group), and (2) to 
study the relationship when reading aloud and reading silently for two levels of difficulty of 
prose material. 


METHOD 
Subjects 
The subjects were 48 7- to 8-year-old children who were selected from 77 children of that 
age in three schools in the manner detailed below. 


Materials 
A personality test, and three types of reading material, which varied in the amount of 
context provided, were employed. 


Personality Test. The Junior Eysenck Personality Inventory (S. Eysenck, 1965) was used. 


Reading Materials. (a) No context. The revised version of the Burt Graded Word Reading 
Test (Burt, 1976) was used as the reading test without context. The child is required to attempt 
to read aloud a list of 110 unrelated words printed five to a line. (b) Sentence context. The 
Primary Reading Test 2 (PRT2) (France, 1978) was selected as a silent reading test with some 
context. The task in this test is to choose from a set of five words the correct word to go ina 
blank space in a sentence. The test contains 48 sentences of increasing complexity. (c) Passage 
context, The prose reading materials were passages 2 (approximately 50 words) and 3 
(approximately 70 words) of Forms B and C and their associated comprehension test questions 
(eight per passage) of the Neale Analysis of Reading Ability (Neale, 1966). This provided two 
levels of difficulty. 


Procedure 

The JEPI was administered orally to a whole class at a time to ensure that possible 
reading comprehension difficulties did not invalidate the scores obtained. The children 
responded by ticking the YES or NO boxes on an answer sheet. 


The PRT2 and the silent-reading version of the Neale material were also given on a group 
basis. For the PRT2 children were allowed as much time as they required either to complete 
the test or to do as much as they could. For the Neale silent-reading material children were 
given 2:5 minutes, which was sufficient to allow the majority to read the passage at least three 
times. They were told that they would have to answer comprehension questions on it. 
Administration of the second, more difficult passage of the silent-reading Neale material 
followed the completion of the questions on the first passage. Children were allowed as much 
time as they needed to write down their answers to the questions and were not allowed to look 
back at the passage while they were answering them. 


The Burt test and the reading-aloud version of the Neale material were administered to 
individual children at the same testing session. The Burt test was given in the standard manner, 
the testing being stopped after the requisite number of reading errors on the child’s part. For 
the Neale material children were instructed to read through the passage three times, and then 
to write their answers to the questions without being able to look back at the passage. 


All children in the four classes were tested, except for those who were receiving extra 
tuition in English, so that 77 children completed the JEPI. Of these, two who were absent 
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from school when the oral tests were administered, and those who scored above 8 on the lie 
scale of the JEPI were omitted from the analysis. The remaining children were grouped within 
sexes, on the basis of their extraversion score into introverts, ambiverts and extraverts, and 
excess children were randomly excluded. This gave six groups of eight children. The extra- 
version groupings, with means respectively for boys and girls, were: extraverts, range 17-20, 
means 18:25, 18:13; ambiverts, range 14-16, means 15-00, 15-00; introverts, range 8-13, 
means 11:50, 11-50. 


RESULTS 


The results will be presented in two sections: (1) silent reading and limited context (Burt 
and PRT2 tests), (2) silent and oral reading with context for comprehension (Neale materials). 


(1) Unrelated word and sentence completion; silent reading 
Table 1 gives the mean raw scores on the Burt and PRT2 tests. 


TABLE 1 


MEAN RAW SCORES ON THE BURT AND PRT2 TESTS FOR 
DIFFERENT PERSONALITY GROUPINGS 




















BURT PRT2 
Mean SD Mean SD 
EXTRA 37-88 11-89 15-63 6-16 
BOYS AMBI 29:38 9:54 8:75 2:54 
INTRO 25 50 7-76 7-88 5:44 
EXTRA 27-50 5-07 8-25 5-26 
GIRLS AMBI 30-13 7:49 10-38 6:30 
INTRO 34°75 9-02 11:88 6°57 
FIGURE 1 
BURT AND PRT2 SCORES FOR EXTRAVERSION GROUPS 
49 BOYS 
35 
BURT TEST 


PRT2 


MEAN READING TEST SCORE 


! A E 
EXTRAVERSION 
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A three-way analysis of variance with repeated measures was performed on the data. Of 
the three simple main effects, only Reading Test Type was significant, but since this is a 
reflection of the number of items on the tests, it will be disregarded. There was one significant 
interaction requiring consideration, that between Extraversion and Sex (F = 4-78; df 2,42; 
P « 0-05) and this interaction is illustrated separately for the Burt and PRT2 tests in Figure 1. 


The interactions of Extraversion with Sex on the Burt and PRT2 tests closely resemble the 
equivalent graph for the Neale material which is given in Figure 2 and will be discussed in the 
next section. 


(2) Prose passage comprehension; silent and oral reading 
The mean comprehension scores for the passages from the Neale Analysis are given in 
Table 2. 


TABLE 2 
MEAN RAW СОКЕ (Max 8) ON THE NEALE MATERIAL 


ALOUD 


Easy Difficult 
Mean SD Mean SD 


5-63 1-58 
4°63 1-22 
4-38 1-49 


4°88 1:27 
5:25 1:30 
5-38 1:80 






BOYS АМВІ 



















GIRLS AMBI 






An ANOVA with repeated measures was performed оп the data. Of the four simple main 
effects, only Difficulty was statistically significant (F = 330-00; df 1,126; P < 0-001). This 
was clearly expected. There were three significant interactions requiring consideration (a) 
Extraversion and Sex (F = 3-71; df 2,42; P < 0-05), (b) Difficulty and Mode (F = 24-00; 
df 1,126; P < 0-001) and (c) Difficulty and Extraversion (F = 3-94; df 2,126; P « 0-05). These 
interactions will be discussed in turn. 


DISCUSSION 


(a) Sex and extraversion. The interaction between Sex and Extraversion is shown in 
Figure 2. Boy extraverts scored higher on reading comprehension than boy introverts, whereas 
for the girls, introverts obtained higher scores than extraverts. As already noted this is almost 
identical to the pattern for the other reading matenals with less context. The reason for the 
interaction is not clear, but several studies of extraversion show a reversal of effects for boys 
and girls. Riding and Dyer (1980), Riding and Armstrong (1982), Riding and Boardman (1983) 
and Riding and Egelstaff (1983) all found interactions of sex with extraversion on various 
cognitive tasks. Riding and Boardman sum up their findings by saying that ‘‘where the boys of 
a particular extraversion level score above average the girls score below average and vice 
versa" (p. 76). The present results are in line with this observation. 


It is possible that extraversion is the social manifestation of an individual's level of 
physiological arousal and that the cognitive manifestation is the mode of representation of 
information in memory (Riding and Dyer, 1980). While almost all individuals can generate 
both verbal and imagery representations of information if they make a deliberate effort to do 
so, each person has a preferred mode of representation which is habitually and voluntanly 
used when analysing information that is seen or heard (Riding and Dyer, 1980). There is a 
continuum of habitual mode ranging from almost entirely verbal to almost totally imaginal, 
and learning style may be assessed by means of the Verbai-Imagery Code Test (Riding and 
Calvey, 1981). Further, a reasonable correlation (г = 0:7) has been found between imager- 
verbaliser style and introversion-extraversion (Riding and Dyer, 1980). Extraverts tend to be 
verablisers and introverts imagers. 
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FIGURE 2 
NEALE COMPREHENSION SCORES FOR EXTRAVERSION GROUPS 


MEAN COMPREHENSION SCORE (max 8) 





l A E 
EXTRAVERSION 
In the case of boys, the superior performance of the extraverts who are hkely to be 
verbalisers would correspond to the present findings, although in the case of girls the results 
are the reverse. Since the correlation between extraversion and verbalisers was so for both 


boys and girls the reason for the different effect for girls is not clear, except that it accords 
with the general finding that performance of the girls is the inverse of the boys. 


FIGURE 3 
COMPREHENSION AND PASSAGE DIFFICULTY 


PASSAGE 
DIFFICULTY 


MEAN COMPREHENSION SCORE (max 8) 
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(b) Difficulty and extraversion. The interaction of difficulty with extraversion is shown in 
Figure 3. The fact that on the difficult passages ambiverts obtained lower scores than either 
extraverts or introverts could be due to the ambiverts using dual coding (i.e., both verbal and 
imaginal) while the extraverts were biased to verbal coding alone and the introverts to imaginal 
coding. Dual coding implies a longer information-processing time than single coding, and this 
would make the difficult passage even more difficult for ambiverts than for introverts and 
extraverts. 


It may be noted that the pattern of results for extraversion was similar both with and 
without context, and there was no significant interaction between extraversion and reading 
silently or aloud. 


(c) Difficulty and mode. The interaction of difficulty with mode that is shown in Figure 4. 
This finding is peripheral to the main area of interest of the present study, but is presumably 
because, with low difficulty of material, reading aloud is a hindrance since it uses up 
processing capacity but does not improve comprehension, while with difficult material reading 
aloud ensures attention to all words and so improves comprehension. 





FIGURE 4 
NEALE COMPREHENSION SCORES FOR READING MODE 
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Requests for reprints should be addressed to Dr. R. J. Riding, Department of Educational 
Psychology, Faculty of Education, University of Birmingham, PO Box 363, Birmingham, B15 2TT. 
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Papers delivered at the Annual Conference of the Education Section of the 
British Psychological Society, held in the University of York, September, 1985. 


ALEXANDER, R., Child Advisory and Psychology Service, Birmingham City Council. 


SOCIAL SKILLS TRAINING — THE FACT AND THE FICTION 


Social Skills Training was first used by educational and clinical practitioners on the basis 
that if you could spot the differences between the socially competent and the socially inept you 
could ‘top up’ the skills of the inept in precisely those areas in which they were deficient — and 
so help them to a more satisfying and fulfilling life. The first skill deficits noticed amongst 
people regarded as inadequate were non-verbal deficits such as poor eye contact, poor posture 
and so on, and many of the early social skills training programmes attempted to teach these 
skills directly, increasing eye contact, improving posture. 


As the research effort yielded greater amounts of data on the non-verbal components of 
social interaction it became clear that the relationship between the components and a skilled 
social performance was by no means a simple one and the early assumptions that the more the 
eye contact the more skilled the performance, were never substantiated. More recently there 
has been considerable research effort directed at the analysis and teaching of the complex 
verbal components of social interaction, with somewhat mixed results. The very complexity of 
these components renders their identification and training exceedingly difficult especially 
when the objective is to teach the skill components to grossly inadequate people. 


Social skills training has been practised now for more than 20 years and if the work is to 
be profitably continued a number of issues must be faced by practitioners. These may be 
summarised as follows: 


(1) Are the differences between competent and incompetent social behaviours that are 
observed essential differences? 

(2) Are the situations in which incompetent social behaviour is observed ‘ecologically’ 
valid? 

(3) What training situations — if any, will permit behaviour taught to generalise to 
significant real life situations? 


These issues were considered in the context of recent research by the author on social 
skills training with young offenders. 


BEvAN, K., and WHELDALL, K., Department of Educational Psychology, University of 
Birmingham. 


A TOUCHING WAY TO TEACH 


The importance of the role of non-verbal communication in education has been 
increasingly recognised during the past fifteen years. This paper outlines some recent research 
carried out by the authors into the tactile aspects of primary teachers' non-verbal behaviour, 
an aspect that has been virtually unexplored in British classroom research. 


In order to begin to document the role that teacher touch behaviour plays in classroom 
communication and observational study was conducted on the use of teacher touch in relation 
to the categories of touch used and the pupils’ body areas touched. These touch behaviours 
were then analysed according to pupils’ age and sex. Thirty-six female teachers were observed 
during the course of the project (12 nursery class teachers, 12 reception class teachers, 12 post- 
reception class teachers). The collected observational data were analysed according to a 
category analysis developed for this particular project. One of the interesting findings from 
the research was that teachers’ touch behaviour was significantly different in respect of the sex 
of the pupil, i.e., boys were touched significantly more frequently than girls in all three of the 
M тап, There were no significant differences found between the amount or type of touch 
received by pupils of different age-ranges. 
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CROOK, C., Department of Psychology, Durham University. 


STYLES OF COLLABORATIVE PROBLEM SOLVING MAINTAINED BY PRIMARY 
SCHOOL SOFTWARE 


The gains associated with organising collaborative learning experiences in primary 
classrooms have been documented in a number of independent reviews. However, there have 
been few attempts to identify what takes place during peer interaction that might underlie its 
advantages. Computer-based problem-solving has been identified as a powerful medium for 
supporting collaborative thinking. The present paper describes the characteristics of children’s 
overt reasoning as provoked by selected examples of primary software. The interactions of 12 
pairs of top infants were fully recorded during five separate sessions. Transcripts indicate that 
the extent to which critical thinking emerges as a social phenomenon is strongly determined by 
structural features of the computer activities. The conclusion ventured is that the medium 
reveals unusual potential as a resource for collaborative problem solving; however, this 
potential depends on a close attention to how various cognitive challenges may be most 
effectively realised in the design of computer tasks. 


DOCKRELL, B., Scottish Council for Research in Education, Edinburgh. 


REPORTING ASSESSMENTS OF PUPILS’ BEHAVIOUR AND ATTITUDES 


There is in the UK increasing interest in reporting pupils’ attitudes and behaviours. 
Indeed reports of this kind have the official support of the Secretary of State for Education. 
The paper reports studies of the response of teachers and parents to the inclusion of these 
assessments of pupils in reports, both by the schools to parents and by the schools to 
employers. 


The paper draws on a questionnaire study of a large sample of Scottish teachers and an 
interview study of parents in five representative schools. 


Virtually all teachers are agreed that it is appropriate that assessments of this kind should 
be made and used as the basis for school references or testimonials but are about equally 
divided on the appropriateness of including formal assessments of these characteristics in 
school reports, either during the course of schooling or at its conclusion. 


Parents too are divided. Some see the moral education of children as being a joint 
responsibility of school and family which can only be effectively carried out if there is 
appropriate communication between them. Others regard assessment of affective 
characteristics as outside the teacher’s competence and outside the responsibility of the school 
system. 


Brief reference is made to a study of procedures which enable such assessments to be 
soundly based. The two studies of perceptions were funded by SSRC, the other of the 
profiling procedure by SED. 


EDWARDS, A., and FONTANA, D., Department of Education, University College, Cardiff. 


THE CONSTRUCTION OF A PRE-SCHOOL SELF: THE NURSERY SCHOOL AS AN 
ENABLING ENVIRONMENT 


Four-year-old children’s construction of a ‘‘school self'' as a means of negotiating within 
the developmental opportunities afforded by the nursery school environment is examined 
through a modification of repertory grid technique. Children’s perceptions of child/adult 
relations revealed in the data are discussed within the framework of the power of relationships 
that exist in the nursery school. Taking as axiomatic Taylor’s (1977) description of self as 
“responsible agent” implications for the learning situation are drawn, to suggest that child 
teacher intersubjectivity may be a fruitful alternative to dependency. 
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ELITHORN, A., and STAVROU, A., Royal Free Hospital and Institute of Neurology, London. 


LEARNING STYLES AND LEARNING SKILLS 


In the United States educationalists argue that a ‘‘mismatch’’ between a pedagogue's 
teaching style and a pupil's individual learning style is a major cause of frustration, failure and 
behaviour disorder in children and adolescents (Carbo, 1982). In this country recent research 
sponsored by the IBA suggests that individual differences in imaging style in children may be 
relevant to media-based skills acquisition (Wober, 1985). 


In the present paper we present the result of a school study using a teacher friendly 
automated test system (Elithorn ef al., 1982) which demonstrates relationships between 
imaging and intellectual skills which are sex-related and sufficiently marked to warrant further 
study. 
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FREEMAN, A., Department of Educational Research, University of Lancaster. 


SOCIAL COMPETENCE AND SPECIAL EDUCATIONAL NEEDS: CURRICULUM 
IMPLICATIONS 


This paper explains the process of the development of a curriculum based on Stephen 
Greenspan's model of social competence. 


The theoretical basis of the model is explained. It is essentially a cognitive model and 
combines social intelligence and some aspects of emotional intelligence as defined by 
Greenspan. Its major difference lies in the absence of those items usually termed self-help 
skills such as cooking, tying shoelaces etc. 


Of the three basic areas within social competence of temperament, character and social 
awareness, the aspect which has been worked on is that of social awareness, since this has been 
identified as the area which would enable most change to occur through teaching. 


Within the area of social awareness are three aspects, which are further broken down into 
seven discrete areas of role-taking, social inference, social comprehension, psychological 
insight, moral judgment, referential communication and social problem-solving. 


It is within these seven areas that the team of four teachers and researcher have been 
attempting to define the parameters of a curriculum, through bringing different perspectives 
forward for discussion and through practice. 


HUGHES, M., University of Exeter, and MACLEOD, M., University of Edinburgh. 


THE EFFECTS OF LOGO ON CHILDREN'S THINKING 


There has recently been a good deal of interest and enthusiasm throughout the 
educational world in the programming language LOGO. Claims have been made — notably 
by Seymour Papert of Massachusetts Institute of Technology — that using LOGO will have 
far-reaching positive effects on children's thinking and development. 


This paper starts by giving a brief introduction to LOGO, focusing specifically on the 
most well-known aspect of LOGO, namely Turtle Graphics. With Turtle Graphics the child 
writes programmes to control the movements of a Turtle — either a real floor-crawling robot 
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or a simulated object on a screen. Both kinds of Turtle have pens which can leave a trail 
behind them, thus allowing the child to draw pictures or patterns on the floor or on the screen. 


The paper then goes on to examine some of the claims made about LOGO, paying 
particular attention to the areas of creativity and critical (or reflective) thinking. The limited 
amount of evidence currently available suggests that children using LOGO are indeed likely to 
show these aspects of thinking while at the keyboard: however, there is as yet little evidence 
for any more general effects. 


JAEGAR, M. E., and Norwicu, B., Department of Child Development and Educational 
Psychology, University of London Institute of Education. 


THE ROLE OF ABILITY AND AFFECTIVE AND CONATIVE FACTORS IN SCHOOL 
LEARNING 


A model describing the antecedents and consequences of intended and actual learning 
behaviour, outlined in the Norwich/Jaegar paper was tested with respect to gains in spelling 
achievement in a sample of 4th-year junior pupils. Beliefs about and evaluations of learning 
outcomes, attitudes and subjective norms concerning learning to spell, self-perceptions of 
spelling ability, intended and actyal learning behaviours and spelling achievement were 
assessed twice over a 6-month period in four 4th-year junior classes. This paper reports an 
analysis of the data that bears on the relative contribution of initial spelling levels and motiva- 
tional factors in predicting subsequent achievement. Methodological issues are discussed in 
addition to future research directions. 


KYRIACOU, C., Department of Education, University of York. 


THE PSYCHOLOGY OF EFFECTIVE TEACHING: IS A CONSENSUS EMERGING? 


Research on effective teaching in schools over the last two decades has largely focused on 
one of three approaches: (1) an analysis which focuses on the notions of ‘active learning time’ 
and ‘quality of instruction’, (2) an analysis which focuses on psychological concepts, 
principles and processes, and (3) an analysis which attempts to identify key classroom teaching 
skills. More recently, a significant shift in attention has occurred among researchers and those 
discussing the role of the psychology of education in contributing to the development of 
effective teachers during initial teacher training courses, towards looking at the nature of a 
complex three-way interaction between ‘teacher strategies and perceptions’, ‘pupil strategies 
and perceptions’ and ‘characteristics of the learning task, activities and experience itself’, and 
how this interaction may relate to the effectiveness of the learning experience. This shift is 
implicit in a number of recent writings (see for example Doyle, 1983; Kyriacou, 1985; and 
various contributions to Entwistle, 1985, particularly Bennett, Child, Desforges and 
Tomlinson), most notably in discussions of ‘cognitive matching’ and ‘teacher-pupil cognitive 
and affective communication’. A consensus appears to be emerging here on which crucial 
considerations need to be taken into account in discussing the basis of effective teaching, of 
which the most prominent are: (1) how context variables influences classroom processes, (2) 
how pupils actively structure and cope with classroom demands, (3) how the learning 
experience (both cognitive and affective) is related to each pupil’s cognitive and affective 
psychological state and pupils’ consequent appraisals, and (4) how a teacher monitors his/her 
own performance and monitors and evaluates each pupil’s learning. 
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LIGHT, P., COLBOURN, C., and MITCHELL, J., Department of Psychology, University of 
Southampton. 


SOCIAL PROCESSES IN MICROCOMPUTER USE IN SCHOOLS 


The potential of microcomputers for facilitating constructive peer interaction and 
collaborative learning has been much remarked but little studied. We have been studying 
social processes in problem solving by schoolchildren using microcomputers for several years, 
but using short-term experiments in rather contrived settings. We are presently engaged in an 
ESRC funded study of the acquisition of microPROLOG by 12-year-old children, and this 
study has provided a setting within which to observe interactions between children working 
under somewhat more ecologically valid conditions. 


Our use of microPROLOG (and in particular its elementary database handling facilities) 
has been in the context of a humanities foundation year in a comprehensive school. 
Approximately 60 children had weekly sessions for rather over one term, working in groups of 
eight. In some of these groups each child had his or her own Spectrum microcomputer. In 
others pairs of children worked together on a total of four machines. In others again, only two 
machines were available and children worked in groups of four. 


Sources of data for analysis include measures of the individual children's grasp of 
microPROLOG and several attitude measures, together with videotapes made during the 
classroom sessions, excerpts from which will be used to illustrate the presentation. 


LOWENSTEIN, L., Fair Oak, Hampshire. 


INTEGRATION VERSUS SEGREGATION 


Partly for political reasons and partly because of the desirability of integration, there has 
been a tremendous desire to assimilate many children previously termed ‘in need of special 
education' into the ordinary educational system. In the United States this is termed 
‘mainstreaming’. The desire for this is justified since many children can be integrated into the 
normal system and benefit from such integration. 


There are, however, children who are so disturbed or so much in need of special 
education and treatment that their integration into the normal system is likely to be to their 
own disadvantage as well as to the disadvantage of the group to which they are integrated. 


It is suggested that there is no black and white thinking on this matter but that integration 
depends on the degree of handicap or special need rather than ideological or political or even 
psychological preferences. 


There is, therefore, a place for special education both residential and day for those young 
people who require such help because they are unable to make progress in the ordinary system 
of education and are disturbing influences there. Such children require small groups and 
individual teaching with specialist teachers who understand their problems as well as the need 
for developing their academic potential. This is not possible in the ordinary school or class- 
room for some children. 


Norwicu, B., and JAEGAR, M. E., Department of Child Development and Educational 
Psychology, University of London Institute of Education. 


UNDERSTANDING SCHOOL ACHIEVEMENT IN TERMS OF CHILDREN'S BELIEFS 
AND EVALUATIONS ABOUT LEARNING PROCESSES AND OUTCOMES 


This paper is concerned with constructing a theoretical model of the relationships 
between attitudes, beliefs, learning behaviours and school achievement. The model involves 
the relationships between personal judgments about ability, attitudes and subjective norms in 
terms of their influence on intended and actual learning behaviour in school and subsequent 
gains in school achievement. The constructs and their relationships are derived from Ajzen 
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and Fishbein’s (1980) ‘theory of reasoned action’, Bandura’s social learning theory (1977) and 
Bloom’s model of school learning (1976). The paper discusses how such theorising has been 
adapted to understanding aspects of school learning to take account of the nature of the 
activity of learning a curriculum subject and the context of schooling. 


Pan, J. F., Senior Educational Psychologist, Lancashire Education Authority. 


ADJUSTMENTS IN RESIDENTIAL SPECIAL EDUCATION 


Independent residential special education is characterised by its finite organisation, the 
directness of its responsibilities and substantial accountability. This necessity encourages the 
development of systems to meet the immediate and individual needs of its pupils. The 
adequacy and imagination of those systems are amongst the factors which are necessary to 
affirm the effectiveness of the organisation. 


The content of this paper gives account of a particular school's attempt to develop 
methods and materials through the co-operative involvement of its entire staff. Against the 
background of substantial literature, experience and practice elsewhere, that content would 
not pretend to be providing answers. It 1s, perhaps, the implicit processes which have brought 
about that development which will be of interest. 


The Warnock report and the recent legislation on behalf of children with special 
educational needs will continue to challenge and stimulate educational thinking and practices. 
Both report and legislation give definition to the extensive range of special educational need 
and both address themselves to the question of integrated educational provision. We are daily 
aware of the disparity between the needs of such provisions and the currently available 
national resources. The form of integrated provision, as dictated by resources, is an area for 
evolutionary development. Children who express special needs with regard to their emotional 
and social behaviour will exact substantial demands upon resources and upon skills of the 
provision to which those resources might be applied. 


It is possible for residential special education to develop its expertise in relative isolation. 
Continuing developments which, for example, will define and meet the objectives of 
reintegration more precisely, can proceed in partnership between residential schools. This 
would be towards the more effective use of resources within a range of regional provision. In 
parallel, such initiative can proceed towards the goal of more interaction with mainstream 
education. That interaction could allow for the exchange of the skills and information which 
would help to sustain integration where it is appropriate and possible. 


PoRTEOUS, M. A., and KELLEHER, E., Department of Applied Psychology, University 
College, Cork. 


SCHOOL CLIMATE AND ADOLESCENT PROBLEMS 


Various studies have shown a relation between school climate, structure and organisa- 
tions and pupil outcomes. Outcomes such as delinquency, academic success, personality and 
motivation etc. have been related to both pupil and teacher perceived views of a school's 
psychological climate, its size, physical structure, and type of organisation (Gump, 1980; 
Rutter, 1979). 


In the course of research on adolescent problems, Porteous and Fisher (1980) noted 
differences in the admission rates by adolescents in different schools to items on an adolescent 
problem checklist. 


The present study examined the relation between pupil perceived school climate, using 
Finlayson's questionnaire, which categorises climate into four variables, Emotional Tone, 
Task Orientation, Control and Concern, (Finlayson, 1972), pupil problem admission using the 
Porteous Problem Checklist (Porteous, 1985), and certain school-type variables. 
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The study was carried out in Ireland, in 15 schools, which were categorised into four 
groups depending on their ‘official type’ in the Department of Education statistics. The 
schools were also classified as to whether they were run by religious orders, or not so run. 


Using sex as a covariate, differences emerged between school types on both climate and 
problem admission variables. An interpretation of type and climate effects on problem 
admission is made. 
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POWELL, S. D., and RIDING, R. J., Department of Educational Psychology, University of 
Birmingham. 


THE FACILITATION OF THINKING SKILLS IN PRE-SCHOOL CHILDREN USING 
COMPUTER PRESENTED ACTIVITIES 


Thirty-six 4-year-old children were given the Raven's Coloured Progressive Matrices and 
were then randomly divided within sexes into two groups. A treatment group worked once 
through 16 computer presented problem-solving activities while the second was a control 
group and did not do the activities. All children were then re-tested on Raven's Matrices. The 
results showed a significantly greater improvement between the pre- and post-test for the 
treatment group than for the control group. The results were discussed in terms of the 
development of thinking skills. 


In a second experiment, a different sample of 60 4-year-old children were given the 
Raven's Coloured Progressive Matrices and were then treated in a manner similar to that 
employed in the first experiment except that the treatment group did the activities to a 
minimum criterion of 70 per cent accuracy. The results were similar to those of the first study 
but there was a greater pre-post-test improvement in the case of the treatment group. 


SHARP, A., Department of Education, University of Edinburgh. 


CLASSROOM DISSENT AND THE INEFFABLE 


This paper offers some reflections upon issues raised by ethnographic investigations of 
behavioural problems in the classroom. The author's work has employed the concept of 
dissent to represent features of pupils problem behaviour (misbehaviour, indiscipline, 
classroom disruption, deviance, acting out, etc.) which can be considered routine and 
unexceptional within the institutional setting of schooling. Descriptive analyses of classroom 
misbehaviour render the participants’ accounts but do not necessarily explicate the conditions 
which produce the behaviour in question (with the related accounts). The implicit nature of 
these conditions, and their relationship to actual patterns of classroom interaction, form the 
focus of the paper's considerations. 


For the researcher with an investigative brief, for the psychologist with a consultative 
brief, and for the teacher with an educative brief, similar issues are encountered. How are the 
implicit and consequently ineffable conditions to be articulated — by whom, in what context, 
through what process . . .? What part is played by ''theoretical" formulations in explaining 
behavioural problems? Are the implicit conditions of theorising, of explaining, and of 
accounting (justifying) linked to the implicit conditions producing behavioural problems? 
Deriving from research examples, a conceptual framework for understanding these issues is 
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proposed and some of the consequences for professional practice are considered. The paper 
draws attention to the problematic nature of person-institution relationships which can be 
seen to encompass the construing activities of pupils, teachers, researchers and psychologists. 
A major implication is then a requirement for multilateral programmes of action rather than 
unilateral intervention strategies. 


SiNSON, J. C., School of Education, University of Nottingham, and WETHERICK, N., Depart- 
ment of Psychology, University of Aberdeen. 


INTEGRATED YOUNG DOWNS CHILDREN (VIDEOTAPE) 


Earlier studies by Sinson and Wetherick of the introduction of Downs children (aged 3 to 
6) into groups of normal children showed that the Downs child remained isolated in the group. 
An observational study (using videotape) was conducted of one-to-one same sex play 
encounters between Downs children and 104 normal children aged 2!4-6 in the normal child's 
own home and in the presence of both mothers. The Downs children again showed themselves 
able and willing to interact with the normal child but the normal children were clearly aware of 
the Downs child's abnormality (though without relevant previous experience) and their 
behaviour showed evidence of the exaggeration and prolongation of patterns of behaviour 
involving gaze that are normal between children meeting for the first time — but quickly 
disappear in a normal/normal encounter. Similar distortions were observed in both patterns 
of play and vocalisation by the normal children. 


Only the youngest Downs children (3-year-old) were acceptable to the normal children — 
possibly because they were classified as ‘babies’ and treated accordingly. This may explain 
why the earlier study found no problem in interaction between Downs children and their 
siblings in the family situation. Suggestions are made to help make integration a less painful 
process for many Downs children. 


SUTHERLAND, P. A. A., St. Mary's College of Higher Education, Twickenham. 


A LONGITUDINAL STUDY OF VALUE CHANGE IN LATE ADOLESCENCE 


A cohort is being followed through four years of college study, focusing on religious and 
moral values. By means of a Likert scale plus a value priority scale means are established for 
comparison between the years. By means of interviews insight is sought into the fundamental 
values of the subjects and the chief formative influences on them. 


The main hypothesis is that there will be a swing away from the conservative values of the 
school and home background to the liberal radical values of the college environment. 


Results so far show a slight tendency in this direction; differences are statistically 
significant. The non-Catholic sub-example has more traditional Catholic values than does the 
Catholic sub-sample. A number of patterns of development emerged; a common one being the 
formation in the 6th form of a liberal peer group consensus on moral values, but the retention 
of conservative religious values into the college years when they then come under challenge. 
Generally the teaching against abortion is supported but the prohibition on contraceptives is 
not. 


THOMSON, С. О. B., and BUDGE, A., University of Edinburgh, BUULTJENS, M., and LEE, M., 
Moray House College of Education, Edinburgh. 


MEETING THE SPECIAL EDUCATIONAL NEEDS OF VISUALLY IMPAIRED 
CHILDREN 
Operating under a 12-month research grant from the Scottish Education Department the 
research team have examined the process of meeting the special educational needs of a 
particular handicapped group. The dimensions of the population of visually impaired children 
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in Scotland are described together with an account of how such children have their special 
educational needs ‘recorded’ under the provisions of the Education (Scotland) Act 1981. By 
consensus the population of visually impaired children presents fewer problems in terms of 
decisions on placement, etc. than other groups and thus affords an opportunity to examine the 
process of ‘recording’ in a manner likely to be less contaminated by other factors. 


The study reports a mixture of methodologies used in the conduct of the enquiry. These 
are: 


(i) Questionnaire data from education authorities in respect of basic descriptive 
statistics of incidence and provision etc. 

(ii) Questionnaire data from parents yielding information on nature of diagnosis, 
degree of sight of the child, nature of placements past and present, attitudes to 
support services, etc. and also extent of involvement in placement decisions. 

(iii) Interview data from education officers and psychologist involved with the drawing 
up of the Record of Needs of visually impaired children; interview data from a 1:10 
sample of participating parents on how they perceive the process of greater 
involvement in decisions concerning their children as provided for in legislation. 


The mixture of quantitative and qualitative data illuminates the ‘recording’ process in a 
manner which suggests — (a) a variation in education authority practice in respect of 
‘recording’; (b) a degree of scepticism amongst psychologists about the ‘recording’ process; 
(c) a trend towards more integration in mainstream settings and (d) a significant shift in the 
nature of the special school population. 


While the study has concentrated on the visually impaired population it nevertheless 
affords useful insights into how the ‘special educational needs’ legislation is working and 
provides a basis for developing a research enquiry which considers that legislation across the 
spectrum of handicaps. 


WHELDALL, K., and METTEM, P., Centre for Child Study, Department of Educational 
Psychology, University of Birmingham. 


BEHAVIOURAL PEER TUTORING: TRAINING 16-YEAR-OLD TUTORS TO EMPLOY 
THE ‘PAUSE, PROMPT AND PRAISE’ METHOD WITH 12-YEAR-OLD REMEDIAL 
READERS 


Eight 16-year-old, low achievement pupils were trained to tutor reading using the ‘Pause, 
Prompt and Priase’ method. The effectiveness of training such tutors was investigated 
through a tutorial programme in which these eight older pupils tutored eight 12-year-old 
remedial children who were retarded in reading. The programme consisted of 24 tutorial 
sessions conducted over eight weeks. Two matched control groups of remedial readers were 
also included in the experiment. One consisted of eight pupils tutored by a group of eight 
untrained tutors who tutored during the same sessions using the same materials. The second 
control group consisted of a third group of remedial readers who read silently, without a 
tutor. The experimental group of tutees, who had a mean pre-test reading age of 8 years 4 
months, made a mean gain of 6 months in reading accuracy by the end of the programme. The 
tutees of control group 1 who had received tutoring from untrained tutors made a mean gain 
of 2.4 months. The pupils of control group II who read silently without a tutor made a mean 
gain of 1.8 months. Analysis of covariance showed the gains of the experimental group to be 
statistically significantly different from the gains of the two control groups. 


WOLFENDALE, S., М.Е. London Polytechnic, TOPPING, K., Kirklees Metropolitan Council, 
and HEWISON, J. 


SYMPOSIUM: PARENTAL INVOLVEMENT IN CHILDREN'S READING 


The symposium intends to give a wide ranging over-view of recent developments in this 
area of explosive growth in education. There is a brief review of the historical and 
international perspective on recent developments. Various different techniques for involving 
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parents in their children’s reading are then considered in terms of techniques, training, 
organisation, evaluation and likely future usage. Initially schemes for ‘‘Parent Listening” are 
examined, before proceeding to some of the more structured techniques. An overview of the 
Paired Reading method is presented, followed by a presentation concerning a variety of 
Behavioural Techniques, including reinforcement, Direct Instruction Precision Teaching and 
Pause, Prompt and Praise. Necessarily brief exploration of a number of variations on and 
combinations of methods follows. Some time is devoted to a review of planning, training and 
other organisational aspects, although these cannot be dealt with in detail in the context of the 
symposium. 

In conclusion, the implications for recent work in this area for an understanding of how 
children learn to read is scrutinised. The challenge presented by recent work to existing under- 
standings of the teaching of reading by teachers and academics is aired. Directions for future 
pragmatic development and for future research are outlined. The symposium concludes with 
questions and discussion with the panel. 


SYMPOSIUM: CHILDREN AND TELEVISION 
SHEEHY, NOEL P., Department of Psychology, University of Leeds. 


THE REPRESENTATION OF CHILDHOOD IN BRITISH TELEVISION ADVERTISING 


Anthropological and historical studies have demonstrated that the idea of childhood is 
culturally and historically specific. Relatively little is known about contemporary ideologies of 
childhood, nor how those ideologies structure and inform the science and technology of child 
development. Television commercials provide a potentially rich data source, since the images 
they portray reflect and influence social values. Through an analysis of the relationship 
between children and the media, Postman (1983) has argued that the traditional concept of 
childhood is disappearing. 

This paper reports an analysis of representations of childhood in British television 
commercials transmitted during 1984. 488 separate advertisements were analysed, of which 88 
were found to depict children. Analyses of advertisements in terms of characters, product, 
format, theme and script used in their production are reported. Coders’ subjective 
impressions of the ‘image’ of childhood portrayed in the advertisements indicated that they 
considered the portrayal ‘naturalistic’ in only 34 per cent of advertisements. In a further 34 per 
cent the image was considered ‘angelic’. Images of the child as ‘miniature-adult’ (16) or impish 
(‘little devil’, 16) were recorded less frequently. It 1s concluded that representations of children 
in advertisements may be characterised in terms of a limited number of portrayal patterns or 
images. Future research should address adults’ and children’s awareness of those images, and 
particularly children’s evaluations of images of childhood. 


Brown, RAY 


CHILDREN AND TELEVISION. THE HIDDEN CURRICULUM 


“It comes in two parts. The first, one group gets into trouble and the other bails them 
out. The other part he goes somewhere with the family. . . ."' These are the words of a 7-year- 
old American child describing, of course, a television series. They reveal a depth of under- 
standing of the stereotyped plot and its mechanisms which is equal to that of most adults. 
Children are sophisticated viewers, but research is prone to ignore this fact. In effects studies, 
for instance, children become fodder for experiments which fail to allow for the complexity of 
the child's relationship with television content. 


There are several levels at which we can explore children's viewing and their under- 
standing of their experience. The question is which level is appropriate to a particular problem 
or concern. Unfortunately much research in this area has left the question unanswered, 
indeed, unasked. The result is that 30 or more years of research has had little significant effect 
upon or understanding of the part played by television in children's lives. 
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In this paper I argue that a holistic, child-centred approach is an essential basis for 
improving our knowledge of the impact of television on children. Examples of such an 
approach, using sympathetic interviewing and observation (both formal and informal) are 
given. The lessons learned by children from television are often unexpected. 
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LEARMONTH, JAMES. 


POPULAR TELEVISION AND SCHOOLCHILDREN 


Young people between 5 and 16 spend, on average, over 20 hours a week watching 
television. Many of their favourite programmes are broadcast in the evening when television is 
catering primarily for family and adult audiences. These programmes are rightly seldom 
expected to ‘‘teach’’, but they are a source from which young people learn and on which they 
may draw in the formation of their attitudes and values. 


Teachers have traditionally reacted to popular TV with hostility or suspicion, believing 
that it undermines the values which schools uphold. Systematic teaching about popular 
television has normally been confined to ‘‘media studies” in the 4th, 5th or 6th year of 
secondary school, taught often by English or social studies teachers. Many primary teachers 
and non-specialist secondary teachers wish to take their pupils’ experience of popular TV into 
account in the course of their normal teaching, but express anxieties about their competence to 
do so, often saying that they received no help in this area during initial training. 


In January, 1982, the Secretary of State for Education and Science, Sir Keith Joseph, 
asked HMI to convene a group of teachers to comment on the values and images of adult life 
presented in a series of popular evening BBC and ITV programmes. In the wake of their 
report, Popular TV and Schoolchildren (DES, 1983), HMI have convened ten regional 
working groups in England so that teachers, teacher-trainers, advisers, broadcasters and 
parents could engage in tasks related to their shared responsibilities in the context of young 
people and television. 


Br. J. educ. Psychol., 106-108, 1986 
CRITICAL REVIEW 


A NEW VIEW OF HUMAN INTELLIGENCE 


By H. J. EYSENCK 
(Institute of Psychiatry, University of London) 


STERNBERG, К. J. (1985). Beyond IQ: а Triarchic Theory of Human Intelligence. Cambridge: 
Cambridge University Press, h/b £25-00, р. £8-95. 


Robert Sternberg, of Yale University, first became known in psychology as an advocate 
of the componential theory of intelligence, which in many ways is a development of 
Spearman’s neogenetic theory according to which the three major aspects or components of 
intelligence are the apprehension of experience, the eduction of relations, and the eduction 
ot correlates. In his componential model Sternberg broke down these very general classes, and 
used a variety of experimental paradigms for isolating the information-processing origins of 
general intelligence. In these experiments, he endeavoured to establish parameter estimates for 
latency components of such aspects of the task as encoding, comparison, justification and 
response. His contribution was both important and novel, and has rightly been praised by 
many expert judges. In his new book he presents a new ''triarchic" theory of human 
intelligence which he states 1s to some extent an outgrowth of the componential theory, which 
now is only one of three sub-theories. The first of these he calls a ‘‘contextual’’ sub-theory, 
which specifies how intelligent behaviour is defined in large part by the socio-cultural context 
in which this behaviour takes place. It involves adaptation to the given environment, selection 
of a more nearly optimal environment than the one actually inhabited, and the shaping of the 
present environment so as to render it a better fit to one’s skills, interest or values. 


The second sub-theory he calls ‘‘experiential’’; it states that for a given task or situation, 
contextually appropriate behaviour is not equally ''intelligent" at all points along the 
continuum of experience with that behaviour or class of behaviours. Intelligence, he suggests, 
is best demonstrated when one is (a) confronted with a relatively novel task or situation, or is 
(b) in the process of automatising performance on a given task or in a given situation. 


The third sub-theory is a development of his componential view, and specifies the 
structures and mechanisms that underlie intelligent behaviour. ‘“The componential sub-theory 
completes the triarchy or specification that define the extent to which given behaviour is 
intelligent.” This, in brief, is Sternberg’s triarchic theory; can it be said to be an advance on 
his more restricted componential theory? 


Let me state first of all that the book is always interesting, full of novel ideas and 
experimental data which throw light on human behaviour. Consequently it should certainly be 
widely read, and while it is written in an overblown, sesquipedalian style that makes it often 
quite difficult to follow what the author is saying, trying to understand him 1s nevertheless well 
worth the effort. On the whole, then, the book is a genuine contribution to psychological 
theory. What I would doubt very much, however, is whether it is a contribution to the theory 
of intelligence. 


Sternberg’s triarchic theory seems to deal with human behaviour in general, as far as it is 
adaptive. Now I think it should be realised that of all human behaviour, even of adaptive 
behaviour, only a relatively small proportion is relevant to the concept of intelligence. 
Sternberg is, of course, entitled to redefine intelligence in any way he sees fit, but this 
disregards the established meaning of the term, and is hence extremely misleading. Sternberg 
would include in the term such factors as personality, mental disorder, past learning and 
experience, and many other factors which have a bearing on adaptation, but which are 
unrelated to traditional concepts of intelligence. To take but one example, Terman ın his 
Studies in Genius found that the high IQ children he studied were nearly all successful in their 
later adaptations to life, but that a small proportion were unsuccessful and ended up in dead- 
end occupations, earning little money, and being relatively unhappy (Oden, 1968). These, he 
found, had as children been rated as neurotic, emotional or disturbed. In other words, 
neuroticism interferes with the worldy success otherwise likely to follow from the possession 
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of a high IQ, and hence neuroticism would be included by Sternberg in his very wide concept 
of intelligence. However, intelligence as psychometrically defined is uncorrelated with 
neuroticism, and it would seem a much more reasonable and indeed more scientific way of 
dealing with the problem to regard neuroticism and intelligence as separate factors which 
interact and jointly may determine a person’s adaptation to the problems of everyday life. 


We may with advantage go back to the old notion introduced by Donald Hebb (1949) and 
Philip Vernon (1979) of Intelligence A, Intelligence B and Intelligence C. Intelligence A, or 
biological intelligence, refers to the biological substratum of all cognitive behaviour 
(problem-solving, learning, memory, invention of strategies, etc.), while Intelligence B (social 
intelligence) refers to the use made by a person of Intelligence A in his everyday life, and hence 
encompassing a large number of factors which are independent of Intelligence A, such as 
personality, temperament, physique, education etc. Intelligence C, in this scheme, is 
psychometric intelligence as defined by IQ tests. These are relatively good measures of 
Intelligence A, but are also to some extent vitiated as pure measures of a scientific concept by 
the same factors that are influencing Intelligence B. From the point of view of science, which 
relies on analysis rather than on confounding of clearly differentiated concepts, Intelligence A 
is surely the only meaningful connotation of the term ‘“‘intelligence’’; Intelligence С is 
acceptable in so far as it has been demonstrated to be a good measure of Intelligence A, but 
Intelligence B is too variegated to be scientifically useful, and indeed resembles in many ways 
popular conceptions of intelligence, as Sternberg indeed emphasises. He in fact welcomes this 
agreement with popular view, and explicitly disagrees with the notion that his conception of 
intelligence is over-inclusive. On this point readers will have to make up their own mind; I do 
not feel that popular ideas should determine scientific concepts, but opinion on this point may 
vary. 


Some readers may query the very existence of Intelligence A, and take the view of Binet 
that intelligence is a statistical artefact, being merely the average of a number of independent 
abilities. Furthermore, they may argue that there is no evidence for **biological intelligence". 
Such a view would be very difficult to maintain nowadays. The two most recent studies of the 
genetic contribution to IQ tests, from the U.S.S.R. (see Eysenck, 1982) and Norway (Tambs et 
al., 1984) respectively, give heritabilities of 78 per cent and 85 per cent, which agrees pretty 
well with the value of 80 per cent found by Fulker and myself in our reanalysis of all existing 
data (Eysenck, 1979). Thus clearly there is a powerful biological determinant behind 
intelligent behaviour, as measured by psychometric tests. 


In recent years there has been additional proof for this, by psychologists who went back 
to Galton's original notion that intelligence could best be measured by physiological tests, and 
who suggested reaction time as one possible measure. Psychologists on the whole disregarded 
this advice, and preferred Binet's solution to the problem of measurement. I have suggested 
(Eysenck, 1967) that there was a good deal of evidence to support Galton's notion, and that 
speed of intellectual functioning, or information processing, was the vital biological 
component in intelligent behaviour. This notion was taken up by the Erlangen School in 
Germany, and by Jensen and his followers in the United States (Eysenck, 1985); a great deal 
of work done by them on simple and choice reaction time, inspection time, reaction time 
paradigms incorporating short-term memory and long-term memory, and various other 
similar paradigms has been used to correlate mental speed with IQ, and the outcome has been 
most impressive, with quite high correlations emerging. Indeed, these correlations of speed of 
reaction to very simple stimuli with psychometric tests of intelligence can be as high as those 
between IQ tests themselves, such as the Wechsler with the Binet. Such findings would be very 
difficult to explain in terms of such theories as those proposed by Sternberg. 


It must be admitted that some of this material is of doubtful value. Some experimenters 
have used unduly wide ranges of ability, including severe retardates in their sample. Others 
have made the opposite mistake and worked with a very restricted range of ability, e.g. 
university students. When appropriate corrections are made, however, it can be seen that the 
average correlation between IQ and choice reaction time, variability of reaction time, 
inspection time, and the various short-term and long-term memory paradigms is around 0-5, 
rising to 0-7 when multiple Rs are used, or when correction is made for attenuation. 


Even more interesting is recent work on genuine physiological measures, such as the event 
related potential on the EEG. Correlations between IQ and such evoked potential parameters 
as latency, amplitude and variability have been found to exceed 0-8 in several studies, and 
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although the paradigms used by such workers as the Hendricksons, Shaffer, Haier and others 
differ, there is now very little doubt that quite high correlations between IQ and physiological 
reactions to very simple stimuli do exist, and require an explanation (Eysenck, 1985). Such an 
explanation can hardly be found outside the concept of biological intelligence, and hence these 
results, very much like those on reaction time, support Galton’s original view. 


Findings such as these clearly contradict Sternberg's theorising, and certainly find no 
explanation in terms of triarchic theory. One would have expected a discussion of these 
findings, together with some hint as to how Sternberg himself envisages their fitting into his 
scheme, or the degree to which they are incapable of being so fitted. Unfortunately there is 
practically no mention of this work in his book. This is a second grave criticism which I think 
must be made of his approach. We cannot pick and choose, in our choice of data, accepting 
those we like to incorporate in our theory, and rejecting those we do not like because they 
would seem to threaten our conclusions. There may be ways of incorporating reaciion time 
and EEG studies in Sternberg’s general view, but at the moment I fail to see how this could be 
done. Failure to consider well established facts must therefore weigh heavily in one's 
judgments of the triarchic theory of intelligence. From my own point of view, the failures 1 
have noted are fatal; the theory contributes to an understanding of human behaviour in 
general, but cannot be regarded as a theory of intelligence. 
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CORRELATIONS AMONG ELEMENTARY 
COGNITIVE TASKS. . 


By PAUL KLINE, JON MAY anp COLIN COOPER 
(Department of Psychology, University of Exeter) . 


Summary. Eight elementary cognitive tasks were selected because of their potential 
application to the study of flexible thinking. The intercorrelations between these tasks 
were computed to investigate whether individual differences in performance could be 
found. These correlations are discussed in terms of the cognitive processes underlying the 
tasks, and it 1s concluded that seven of the eight tasks hold promise for future evaluation. 


INTRODUCTION 


A COMMON criticism of factor analysis from experimental psychology is that the 
emergence of a factor per se tells us nothing about its nature. For this reason 
amongst others Hunt (1978) has examined the cognitive processes underlying verbal 
ability, and has argued that such processes as have been uncovered may not be 
linearly combined in verbal ability. This means, of course, that factor analysis can 
never reveal processes because of its assumption of additivity. Hunt and his 
colleagues (e.g., Hunt et al., 1975) have provided a considerable and stimulating 
body of research of this kind. 


Some factor analysts have also recognised this difficulty. Cattell (1971) advo- 
cated the merging of experimental and factor-analytic psychology, and his ADAC 
model was a good speculative example of this genre. Even earlier, Eysenck (1953) 
was making this point. More recently, Carroll (1978, 1980), has made a systematic 
attack on the problem by listing a huge variety of cognitive processes and the tasks 
which measure them together with the best-established ability factors. His work 
complements, from a factor analytic stance, the cognitive approach of Hunt. 


A further stimulus to this new psychometrics, and perhaps the most influential 
of all, is the componential analysis of abilities first proposed by Sternberg (1977) in 
the study of analogies. Components are processes with a fixed time duration and 
probability of execution, and componential analysis allows powerful prediction of 
an individual's performance on a variety of cognitive tasks, as many tasks can be 
seen as different combinations of a relatively small group of processes. 


Work from these three sources, although different in detail, indicates that the 
combination of psychometric analysis and cognitive psychology can lead to valuable 
insights into the nature of factors and in the study of abilities (at least) as has been 
argued previously (Kline, 1984). 


Carroll (1980) stresses that to ascertain the possibility of measuring important 
dimensions of human cognitive ability through various types of simple cognitive 
tasks a number of questions must be addressed, the following being of ‘‘paramount 
interest”. ‘‘What kinds of individual differences are observed in these tasks? Do 
they exhibit sufficient variance to measure stable individual characteristics that may 
be of relevance in personnel selection and other operational contexts? . . . How are 
they related to those observed through more conventional psychometric tests?” The 
present study addresses the first two of these issues in a study of flexible thinking 
among officers which was being undertaken for the Ministry of Defence. 


111 


112 Elementary Cognitive Tasks 


METHOD 


In the research to be reported in this paper, it was decided to adopt the pro- 
cedures advocated by Carroll (1980) for the study of processes underlying cognitive 
factors, because these were most suitable for the overall plan of the research and 
because, a priori, they seemed to be the most successful in the study of flexibility. It 
must be stressed that there are no previous attempts to investigate cognitive 
processes and flexible thinking. No apology is made for selecting a subset of tasks 
thought to be of relevance to flexibility, rather than a more exhaustive selection, 
since Carroll (1980) observes that there is reason to suspect that at least 30 factors 
can be found running through cognitive tasks. This clearly makes any compre- 
hensive factoring a Herculean labour, and it is necessary to first perform ‘‘studies 
that focus on small sets of variables in the various subdomains, with the hope that 
eventually the structure of the total domain can be inferred . . ."' (p. 120). 


Carroll has specified a large number of elementary cognitive tasks (ECTs) to 
measure cognitive processes underlying factors. First, it is necessary to define the 
term ECT. He suggests the following definition: 


“Ап... ECT is any one of a possibly large set of tasks in which a person under- 
takes or is assigned a performance for which there is a specifiable class of 
“successful”? or ''correct'' outcomes or end states which are to be attained 
through a relatively small number of mental processes or operations and whose 
successful outcomes can differ, depending on the instructions given to, or the 
sets or plans adopted by the регѕоп” (1980, р. 11). 


His survey of experimental cognitive psychology yielded eight basic paradigms each 
embracing a large number of ECTs: 


(1) Perceptual 
apprehension 


(2) Reaction time 
and movement 


(3) Evaluation 
and decision 


(4) Stimulus 
matching/comparison 
(5) Naming/reading 
or association 
(6) Episodic 
memory readout 


(7) Analogical 
reasoning 


(8) Algorithmic 
manipulation 


Determination of threshold values of stimulus 
characteristics, for example. 


Detection of stimulus onset, for example. 


e.g., the truth of a statement based on general 
knowledge. 


Tasks involving same/different judgements. 


Naming a picture, supplying an antonym of a 
stimulus word. 


Here subjects learn X and produce Y — e.g., 
short-term memory tasks. 


This is a task well known to ability testers. 
Actually, as Carroll points out, this paradigm 
may be logically different from the others since 
it may involve a number of other paradigms — 
thus A: B=C:? involves (4) and others. 


These tasks may involve algorithms which are 
already known — e.g., long division — or those 
defined in the experiment. 


Underlying the eight categories, it is argued further by Carroll (1980), there are 10 
basic cognitive processes. These are the monitor, attention, apprehension, percep- 
tual integration, encoding, comparison, co-representation (as used in foreign 
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vocabulary), co-representation retrieval, transformation (e.g., mental rotation) and 
response execution processes. 


The plan of attack advocated by Carroll is to study these basic processes by 
means of the eight paradigms. Processes should be common to all paradigms, hence 
if an ECT in one paradigm measures the same process as one in another (despite 
apparent dissimilarities) there should be a high correlation between them. From his 
examination of the paradigms from the viewpoint of processes, Carroll was able to 
list nearly 200 ECTs with an indication of the cognitive processes they were 
supposed to tap. 


In this investigation we examined these ECTs and selected those which 
appeared to be most relevant to flexible thinking, as conceptualised in the work of 
Guilford (1967) as divergent thinking and as the second-order retrieval factor of 
Cattell (1971). A further constraint on the selection was the need to concentrate on 
those aspects of flexibility which were most germane to military problems — those 
of changing set and interference. 


ЕСТ used іп the study 

The ECTS used in this research are listed below. The rationale for selecting 
these measures, the cognitive processes which it is assumed that they tap, and a brief 
description of the task is given in each case, together with details of the score(s) 
derived. All ECTs were presented on a BBC microcomputer with a medium- 
resolution video monitor; reaction times were measured with 0-001s accuracy. 


ECT 1: Degraded words 1. Words were presented on the screen degraded to 
various extents by switching off varying numbers of pixels. Subjects were asked to 
type in what they thought each word was each time a word was presented. The first 
six words were only slightly degraded and established a set for farm-related words. 
Of the next 24 words, 16 were again farm-related, the remainder were not. The 
number of farm-related responses to these words was scored, the rationale being 
that flexible subjects, who are expected to be less influenced by internal sets than 
rigid subjects, should find it easier to change sets and not be misled into interpreting 
the probes as being farm-related. It is anticipated that individuals' reliance on set 
would have developed in response to their proficiency at perceptual apprehension as 
described by Carroll (1980) — thus someone with poor perceptual apprehension 
would, over time, come to rely more on internal information (set) than would 
someone with good perceptual apprehension. 


ECT 2: Degraded words 2. Words were presented on the monitor by succes- 
sively exposing the eight rows of pixels. After exposing each row, subjects were 
invited to guess what the word was, or to ‘‘pass’’ and see an additional row of dots. 
This task therefore measures the willingness of subjects to guess about the stimuli. 
The words continued to build up row after row until correctly identified; there were 
no penalties imposed for incorrect guesses, an incorrect guess being treated as a 
**pass"'. There were therefore eight chances to guess at each word, the score being 
the total number of presentations needed to work through the 30 words in the test. 


Subjects who are willing to make guesses on minimal information will in the 
long run tend to have lower scores than subjects who wait until they are sure of a 
word before responding. Although this may seem to be no more than a measure of 
riskiness, the fact that there is nothing to lose by guessing wrongly enables the test 
score to reflect an individual's willingness to risk making a mistake — and, as in the 
previous test, this could be related to their confidence in perceptual apprehension, 
although temperamental factors (such as obsessionality) could also be involved. 


ECT 3: Ambiguities. Sixty phrases were presented, and the subject had to judge 
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them as true or false. There were three categories of phrase — true (e.g., ‘а dog has 
legs”), false (e.g., а ram has moons’’) and ambiguous (e.g., “а plane is a tree"). 
Each ‘‘ambiguous’’ phrase was true if the less common of the noun's two meanings 
was apprehended, but false if the more common meaning was taken. The task was 
designed to measure speed of access to internal lexica. If a subject could not access 
words and their meanings quickly, then the need to respond rapidly in the task 
would prevent an ambiguous noun's less common meaning from being available 
before a ‘‘false’’ response was made. Subjects who could access lexica quickly would 
become aware of the less common meanings in time for these to interfere with, and 
possibly overcome, the ‘‘false’’ response, so leading to a longer reaction latency 
and/or a ‘‘true’’ response. This ECT is thought to be concerned with the process of 
evaluation and decision. 


Two measures were derived from this task. The first is the ratio of mean 
response latency in making а ‘гие’? response to an ambiguous item to mean 
response latency in making a ‘‘true’’ response to an unambiguous true item. The 
second is similar for the false responses — i.e., the ratio of response latencies to 
ambiguous items seen as false to the mean response latency to false phrases seen as 
false. 


ECT 4: Homophones. Eighty pairs of words were presented to subjects who 
had to decide whether they made sense. Half the pairs contained a homophone, and 
half made sense, yielding four types of phrase: (a) normal sense, e.g., coal mine, (b) 
homophone sense, e.g., coal seam, (c) normal nonsense, e.g., clock mine, (d) homo- 
phone nonsense, e.g., coal seem. 


The rationale of this test is similar to the previous one — interference from the 
alternative meanings of the homophones should increase reaction times to such 
phrases. A large difference in mean reaction time between homophonic and normal 
stimuli may therefore reveal more interference, and greater flexibility on the part of 
the subject. Thus the score taken was the difference between mean response latencies 
to homophones and non-homophones for all items that were correctly answered. 


ECT 5: Shape comparison. Here subjects were presented with either two 
shapes, two names of shapes, or a shape and a name, (e.g. [circle circle], [O O], [O 
circle]) the task being to decide whether the members of the pair referred to the same 
shape. The reaction time for a shape and its name should be longer because of the 
need for additional processing (access of name code). 


This task has received minute examination over the years, e.g., by Saul 
Sternberg (1975) and Posner (1978). The process thought to underly this ECT is of 
course speed of access to name codes. Rapid access here could well be important in 
flexible problem solution and ideational fluency. Three response measures were 
computed, the most important being the mean shape-name minus the mean shape- 
shape response latency for items answered correctly. Examination of the data 
revealed a large practice effect on this task, so response latency was regressed on 
trial number, the slope of the regression line and goodness of fit (r) being computed 
for each case. However if r was not significant, it and the slope were treated as 
missing data. 


ECT 6: Memory task. For this task, two lists of nouns were presented sequen- 
tially to subjects, with an exposure time of 3s, and 1.5s between presentations. Each 
list comprised 10 words in each of three categories (e.g., fruits, geographical 
features, kitchen tools). The first list of 30 words was ordered by category (the 10 
fruits, etc.) whilst the second showed the words in random order. Subjects were 
asked to recall as many items as possible immediately after the presentation of each 
list. The ratio of the recall scores for the two lists gives an indication of the ability of 
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a subject to organise the material for recall, as it is easier to remember a categorised 
than a random list. Such organisation demands access to long term memory, and it 
is arguable that flexible subjects will perform better at this task than rigid subjects if 
it is assumed that the former have more rapid access to long term memory. A 
measure of extent of clustering (the ratio of the squared “гип lengths’’ for like cate- 
gorised items between the two conditions) was computed as an index of the tendency 
to impart order to unordered data. 


ECT 7: Stroop test. In this, the well known Stroop task was presented on the 
computer. There were three parts to this test following initial training: 


(a) names of four colours were presented sequentially in white lettering, 
subjects having to press the button of the corresponding colour as rapidly as 
possible. 


(b) as above, but with the words being written on the screen in colour rather 
than white. The ‘‘ink colour" did not necessarily match the named colour. 


(c) as in (b), but with subjects now being asked to respond on the basis of ‘‘ink 
colour” rather than the colour that was named. 


. The differences in reaction time between (b) and (a) and between (c) and (a) 
indicate the effect of interference — in the first case that of irrelevant colour, and in 
the second of irrelevant name. Although rigid subjects should perform better at 
tasks (b) and (c) than flexible subjects (who may be less able to ignore parts of the 
stimulus) they should find the change in conditions between (b) and (c) hard to cope 
with. As a measure of this, the slope of response latency in part (c) was also calcu- 
lated, together with overall goodness of fit (r?). As before these data were treated as 
missing when r was non-significant. This task is, of course, part of the paradigm of 
naming or association in Carroll’s (1980) analysis. 


Subjects 

Forty-three undergraduates volunteered to take these ECTs; 25 were male. The 
total testing time was about 45 minutes, and rapport was excellent as the subjects 
found the ECTs enjoyable. 


Research design and statistical analysis 

The analyses to be performed here are concerned with the functional relation- 
ships between various ECTs, and it is clearly desirable to eliminate the effects of 
gross individual differences in motor speed and processing capacity. A difference of 
0-25 between two experimental conditions means the same in terms of a particular 
process whether the individual's two mean reaction times were 0-65 and 0: 4s or 2- 6s 
and 2- 4s. It is therefore unwise to take ratios of reaction times directly (as in ECT 3, 
for example). Reaction time data were therefore standardised for each task within 
each subject (as inspection showed few severe departures from normality) before 
computing the statistics mentioned above. That is to say, for each ECT the reaction 
time scores reflect differences from each individual’s mean for that task. The 
measures described above were then intercorrelated. 


RESULTS 
Table 1 shows the correlations between the variables; since not all subjects 
completed all tests (or yielded an insignificant r in ECTs 5 and 7), this table also 
shows the number of cases from which each coefficient was computed. Thirteen of 
the 80 correlations were significant with P < 0-05. 
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ECT 1, where the score was the number of farm-related words, was supposed to 
measure resistance to changing sets, and hence inflexibility. This correlated signifi- 
cantly with the practice effect involved in responding to a shape and a name (ECT 
5). It was negatively correlated with ECT 6: the subject who gets set in expecting a 
certain category of words is not good at creating categories of his own to aid 
memorisation, which makes sound psychological sense. This negative correlation 
supports the implication of categorisation in ECTs 1 and 6. More speculatively, the 
correlation with the practice effect in the Posner task also suggests that ECT 1 is 
implicated in rigidity; subjects with strong set on ECT 1 start off poorly on ECT 5 
but improve as the task continues. 


ECT 2, the number of exposures needed to identify degraded words, did not 
correlate significantly with any other ECT. This supports the putative non-cognitive 
nature of this task — it may measure risk taking as a personality trait rather than a 
cognitive process. It may be of use in a broader study of flexibility, but as a process 
measure this ECT seems to be of little use. 


ECT 3. The first thing to note about these results is that the two measures from 
this test do not intercorrelate, indicating that ''true" and ‘‘false’’ responses to 
ambiguous items occur after varying amounts of interference for each subject. 


The degree of interference occurring when ‘‘true’’ responses were made was 
positively correlated with the reaction times to ECT 4, interference of homophones. 
This is as it should be, since both should be affected by speed of lexical access. It was 
negatively correlated with interference of the irrelevant colour on the Stroop test, 
which is also predictable since the faster a subject’s lexical access, the less time there 
would be for interference of colour to occur before the response to the name is 
made. The individual who responds ‘‘false’”’ to an ambiguous item after a long pause 
tends to group items for recall in the memory task. A plausible explanation for this 
finding is given in the discussion of ECT 4, below. 


ECT 4, homophone interference, is, like ECT 3, negatively correlated with ECT 
6, the ability to categorise material to be remembered. The individual who suffers 
interference from homophones (presumably because more than one meaning is 
recognised) may perhaps notice several different possible categorisations for each 
noun during the memory task, and this may hinder performance on ECT 6. 


ECT 5 measures the difficulty of access to name codes. It correlates with the 
reliability of the practice effect in learning this task, positively with interference of 
colour, and negatively with the interference of name in the Stroop task. Thus slow 
access to name codes allows colour to interfere when names must be reported, but 
prevents names from interfering when colours must be reported, precisely as 
expected. 


The salient correlations of ECTs 6 and 7 have been discussed above. 


DISCUSSION 
This pilot study of correlations between some elementary cognitive tasks 
designed to tap processes underlying abilities has revealed a number of interesting 
points, discussed above. It seems to us that a number of conclusions may be drawn. 


(a) These ECTs created sufficient variance for individual differences to be 
tapped, and hence, in the best psychometric tradition, they should be studied 
further. 


(b) There are some significant correlations between the ECTs. As we have seen, 
these correlations make sound psychological sense when interpreted in terms of cog- 
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nitive processes. However, ECT2 shows no appreciable correlations with the other 
measures, and may well reflect obsessionality. 


(c) The significant correlations are however generally not sufficiently large to 
enable the battery of ECTs to be shortened by removing redundant tasks. 


(d) The presence of such correlations suggests that some ECTs may load on 
broader, primary ability factors. As Kline (1979) argued (without empirical support, 
however) it could be that ECTs represent independent components of primary 
factors. Thus they could each correlate appreciably with a primary ability factor, 
but intercorrelate not at all. 


(e) This suggests that these ECTs warrant further study with measures of the 
primary ability factors: it is hoped that these findings will encourage researchers to 
use these and other ECTs in an effort to elucidate the nature of these factors, as in 
the work of Hunt and Sternberg amongst others. 


Note. — This study was part of a research agreement with the Ministry of Defence for the 
Army Personnel Research Establishment. 


Requests for reprints should be sent to Professor Paul Kline, Department of Psychology, 
University of Exeter, Exeter, EX4 4QG, England. 
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APPLICATION OF THE INFORMATION PROCESSING 
APPROACH TO THE DESIGN OF A NON-VERBAL 
REASONING TEST 


By PAULINE SMITH 
(Test Development Unit, National Foundation for Educational Research, Slough) 


Summary. A major criticism of currently available psychometric tests, particularly those 
purporting to measure general intelligence, is that they were constructed without any 
theoretical rationale for the choice of item types (e.g., Levy and Goldstein, 1983). This 
paper attempts to evolve such a rationale for one particular purpose; namely, the 
development of a non-verbal reasoning test for 10- to 11-year-olds. In order to achieve 
this end, both traditional psychometric approaches to intelligence and recent work by 
cognitive psychologists are examined. It is found necessary to analyse critically the nature 
of the ability which is to be tested. Following this, some recent studies are outlined which 
may provide a basis for analysing the structure of items within each chosen type. The 
potential benefits of analysing items at a sub-type level are discussed, both in terms of 
theoretical and practical considerations. 


INTRODUCTION 


IN recent years, cognitive psychologists have begun to examine individual 
differences in cognitive task performance; traditionally the domain of psycho- 
metricians (see Carroll and Maxwell, 1979; Sternberg and Detterman, 1979). Of 
particular interest to test designers is what has been termed the ''cognitive com- 
ponents” approach (Pellegrino and Glaser, 1979). Researchers in this area have 
sought to account for performance upon complex test items (as used in standard 
“intelligence” tests) by applying a process model which assumes that task perfor- 
mance consists of a series of information processing components (e.g., Sternberg, 
1979). Differences in response latency (time taken) and error rates have been 
examined in relation to systematic variations in item structure, and in relation to 
overall performance upon standard psychometric tests. 


One important by-product of this approach has been the emphasis upon criteria 
for task selection. Decisions made about the suitability of tasks for ‘‘componential 
analysis" may also apply to the design of new psychometric tests. Sternberg (1979, 
1982) has criticised traditional ‘‘differential’’ approaches to task selection, on the 
grounds that they are not theory-based. He noted two main techniques: namely, 
choosing tasks which have been previously used (historical precedent) or choosing 
tasks which correlate at an intermediate level with the factor of interest. Equally, he 
observed that mainstream cognitive psychology was in danger of becoming a study 
of specific tasks, lacking any theoretical basis or practical utility. 


As an alternative means of selecting tasks, Sternberg proposed four selection 
criteria, namely: 


(1) quantifiability (i.e., within the task, it is possible to assign numerals to 
objects or events according to rules); 

(2) reliability; 

(3) construct validity; 

(4) predictive validity. 


It can be seen that while all four criteria would have to be applied to totally new 
tasks, those tasks (‘‘item types") which have already been used in standardised 
psychometric tests could be assumed to satisfy (1) апа (2). In addition, the better „== 
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tests would probably have used tasks of proven predictive validity. Therefore, if 
previously-used tasks are to be used either for componential analysis or in psycho- 
metric testing, it is essential that their construct validity (and possibly predictive 
validity) are examined. 


THE NATURE OF NON-VERBAL REASONING 


From the above discussion, it is clear that the first steps in deciding what item 
types to use as measures of a particular construct is to define the meaning of that 
construct. In the present case, this involves the separate stages of defining what 
processes are implied by “‘reasoning’’ and what content is implied by ‘‘non-verbal’’. 


(i) Reasoning 

Sternberg (1982) concluded that while most theorists agreed that ‘‘reasoning’’ 
was an important subject of ‘‘intelligence’’, no satisfactory definition was available 
for either term. This being so, it is necessary to survey past theoretical characterisa- 
tions of ‘‘reasoning’’ in order to derive a working definition. 


Cognitive processes described as ‘‘reasoning’’ have been investigated within 
several branches of psychological research. However, given that this paper is 
concerned with the assessment of differences in reasoning ability among children of 
a similar age, only a proportion of these approaches seem to be directly relevant. 
The following review will therefore be limited to research conducted within an 
individual differences framework or which provides evidence as to operations which 
are fundamental to ‘‘non-verbal reasoning’’. Given these restrictions, language- 
based studies of formal reasoning (e.g., Wason and Johnson-Laird, 1972) or 
concept attainment in adults (e.g. Bruner ef al., 1956) will be omitted, as will 
analyses of age-related qualitative changes in reasoning (e.g. Piaget, 1954; Bruner, 
1964). 


Psychometric investigations of mental ability have been well-reviewed 
elsewhere (e.g., Carroll, 1982), so only a summary of the main findings relevant to 
reasoning will be presented here. 


Much of the early work on factor theories of intelligence derived from Spear- 
man’s (1927) “two-factor” theory, wherein intelligence tests were said to measure 
one factor general to all tests and one specific to each particular test. Spearman 
(1923) concluded that ‘‘g’’ largely represented reasoning ability, which he described 
as consisting of ‘‘the eduction of relations” and ‘ће eduction of correlates’. From 
this theory, a number of hierarchical models were developed, all of which included a 
general factor and several group factors (see Vernon, 1950). 


In contrast, US psychologists such as Thurstone (1938) and Guilford (1967) 
attempted to show the existence of several equally important factors. Thurstone 
called one of his factors ‘‘reasoning’’ and found that tests of classification, rule- 
finding and problem-solving loaded most highly on it. Vernon (1950) pointed out 
that if Thurstone's primary factors were re-analysed to reveal a general factor, then 
**reasoning" loaded more highly on ''g" than did other primary factors such as 
“rote memory”. 


Guilford's ‘‘Structure of Intellect’’ model of intelligence involves 120 theoreti- 
cally-distinct factors. Guilford and Merrifield (1960) considered the relations 
between traditional concepts of reasoning and aspects of the model. Induction was 
identified with ‘‘the cognition of classes, relations or systems"', where cognition was 
used in a special sense, meaning ''discovery, awareness, comprehension of under- 
standing”. ‘‘Cognition of classes" was equated with ‘‘classification’’ or ‘‘concept 
attainment” and ‘‘cognition of relations" with Spearman’s ‘‘eduction of relations". 
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An early research project conducted by Guilford and his associates (1955) is 
also of relevance here. They examined three reasoning factors identified by US Air 
Force Research (Guilford, 1947). Two of these factors clearly corresponded to 
classification and to induction. The third factor was termed ‘‘general reasoning" 
since it appeared in nearly all reasoning tests and many non-reasoning tasks, when 
the latter became difficult. As a result of a factor-analytic study, Guilford concluded 
that ''general reasoning" largely represented the ability to define and structure 
problems. 


Cattell (1963) distinguished between ‘‘crystallized and fluid-analytical’’ general 
intelligence. The former referred to reasoning ability using learned material, such as 
language or numbers, whereas the latter involved reasoning with ‘‘meaningless’’ 
new material. Cattell believed that measuring fluid ability was a valuable means of 
reducing the effects of culture or socio-economic background on intelligence test 
scores. To this end he designed the ‘‘Culture-Fair Intelligence Test’’, using subtests 
of non-verbal items requiring classification, relation-finding and the analysis and 
integration of information. Although the notion of a ‘‘culture-free’’ test has attrac- 
ted critics (e.g., Vernon, 1969; Greenfield and Bruner, 1969), it is generally accepted 
that such a test will give a fairer indication of intelligence (than verbal tests) among 
the linguistically-handicapped. 


The ‘‘cognitive components" approach described earlier has provided a more 
precise method of specifying which operations are carried out in the performance of 
reasoning tasks. As stated before, each task is first analysed into hypothesised com- 
ponent processes, providing a model to be tested. By systematic changes in task 
demands, within-subject parameters can be estimated (latency, error rate) for each 
component by using a form of subtraction methodology. For example, the com- 
ponents in an analogy processing model may be estimated by presenting ‘‘A is to B 
as C is to", waiting until the person indicates understanding and then timing how 
long it takes to respond once the **D"' choices are presented (see Sternberg, 1977, for 
full account). Sternberg and Gardner (1982) have reported such analyses of series 
completion, classification and analogies. These tasks were chosen according to 
Sternberg's four criteria, though the authors admit that other item types would have 
satisfied the criteria equally well, in particular, matrix completion. The components 
which their theory proposes are utilised in solution of these item types: encoding, 
inference, mapping, application, comparison, justification and response (though 
not all these would necessarily be used in any one item type). It is claimed that 
“inference” and ‘‘application’’ correspond to Spearman's ‘‘eduction of relations” 
and ‘‘eduction of correlates” (Sternberg, 1979). 


One important finding of this work was that there was a strong correlation 
between the response component latency and (reference-test) reasoning ability. 
Sternberg noted that apart from psychomotor response speed, the response com- 
ponent measured time taken by ''control" components general to all tasks, which he 
termed ‘‘metacomponents’’. These are the processes by which a subject determines 
how to solve an item, by selecting and controlling the use of the components, repre- 
sentations and strategies needed to solve each problem. These metacomponents 
seem to have much in common with Guilford's (1947) ‘‘general reasoning" factor 
(see above). 

In conclusion, any task designed to test *'reasoning"' ability should necessitate 
the use of one or both of two main processes, these being the abstraction of rules or 
classes from given information and the application of such inferred rules or concepts to 
new information. 


(i) The meaning of “non-verbal” 
While the ‘‘verbal-non-verbal’’ distinction seems superficially obvious, it is 
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apparent from a brief survey of current tests that some confusion exists. For present 
purposes, it may be useful to adopt Guilford’s (1967) classification of content, dis- 
tinguishing between semantic and figural or symbolic materials. ‘‘Non-verbal’’ can 
then be defined as material which does not necessitate semantic knowledge of verbal 
material for its solution. It therefore includes letter or word items (where only alpha- 
betic knowledge is crucial), as well as other symbolic material, figural material and 
pictorial material. If Guilford’s distinction is to be used, then pictorial material will 
only be permitted if item solution does not depend upon semantic interpretation of 
the pictures (i.e., only on visual details). This is important if the test is meant to be 
**culture-fair". Indeed, Cattell decided that even though pictorial item solution 
depended upon physical details, only ‘‘culture-fair’’ elements should be used. Since 
these amounted to only parts of the human body, sun, moon and stars, it might be 
better to avoid pictorial materials completely. 


In the case of non-verbal letter or number material, solution customarily 
depends on basic alphabetical or numerical knowledge. It is assumed that all normal 
testees will have this knowledge. However, evidence by Holzman et al. (1982, 1983) 
suggests that in children, number series and number analogy problems test not only 
abstract reasoning but computational skills as well. Thus it seems necesary to avoid 
such material, and also alphabet dependent material, in order that reasoning 
performance be unconfounded by background knowledge. 


It may be noted that in the above discussion, the processes implied by 
"reasoning" and the content implied by ‘‘non-verbal’’ have been analysed. 
Nowhere is it claimed that the reasoning processes themselves are non-verbal in 
nature. Some writers have supported such an omission, noting that verbal mediation 
may occur in the solution of non-verbal reasoning items. However, to suggest that 
since some people use verbal strategies to solve some non-verbal reasoning items no 
other strategies can be assumed seems too extreme a position. This type of test is 
designed so that it obviates the need to use words as a vehicle for reasoning. That is, 
it enables those people whose cognitive bias is towards visuo-spatial processing to 
exhibit their reasoning ability to its full extent. 


Additionally, many non-verbal reasoning items consist of complex abstract 
figures, upon which various operations are performed. To describe the item in 
words is, in many cases, more difficult than to produce a solution (e.g., Figures 1 
and 2). 


FIGURE 1 FIGURE 2 
NON-VERBAL MATRIX ITEM NON-VERBAL SERIES COMPLETION ITEM 


CO 
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In Figure 1, it is possible that a testee may note that the required operation is a 
“reflection”. However, this is not a crucial step in item solution. The testee must 
observe that such an operation is required, but s/he need not know the name of the 
operation. To arrive at the correct solution in Figure 2, some combination of visuo- 
spatial imagery and visual analysis of the figural details would presumably be 
employed: to verbalise the required changes of detail would be extremely compli- 
cated. It therefore seems that **non-verbal reasoning" can be justifiably so-named, 
in that not only does it permit non-verbal processing but it minimises the utility of 
verbal mediation strategies. 


Finally, it should be noted that.controversy also surrounds the nature of the 
processes and representations utilised in the solution of verbal reasoning items. 
Cognitive psychologists disagree as to whether or not people use spatial or linguistic 
mediation to solve such items. The most recent evidence suggests that people use a 
mixture of the two (sce review by Sternberg, 1982). 


“NON-VERBAL REASONING” ITEM TYPES 


For the purpose of obtaining a collection of potentially usable item types, a 
wide variety of currently available tests and test collections have been surveyed. 
These included any tests purporting to measure ‘‘non-verbal reasoning" which were 
suitable for the relevant age range (see Appendices for details). 


From these sources a list of potential item types were derived, as follows: 


. Classification ('*Unlike" or ‘‘Like’’); 
. Series completion; 

. Series ordering; 

. Digit — Symbol pairing; 

. Matrix Completion; 

. Form Reasoning; 

. Locations/Circle Reasoning; 

. Codes/*'secret languages"; 

. Analogies. 


Consideration of the above item types in terms of Sternberg’s four criteria and the 
definition of reasoning derived earlier leads to the elimination of ‘‘Digit-symbol 
pairing" . This appears to be a test of speed and/or visual memory involving little or 
no reasoning processes. 
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A comparison of series completion and series ordering (‘‘choose middle of 
five’’) tasks suggests that the former may give the clearer measure of *'inference and 
application'' components, and as such, it is to be preferred. It is also the form which 
is most similar to the letter and number series which, though not suitable for present 
purposes, are considered to be a very good measure of induction (see Holzman et 
al., 1983). 


The need for flexibility within an item type (i.e., wide possible range of items) 
means that items of the *'locations/circle reasoning” type are not really suitable. It 
would be rather difficult to create large numbers of different items of a suitable level 
for 11-year-olds. 


“Form reasoning'' is also eliminated, though for different reasons. This task 
resembles arithmetic calculations, using an artificial system of symbols and opera- 
tions. Solutions of such items appears to depend upon the concept of substituting 
partial products into longer calculations. As such, the task is likely to be more a 
measure of mathematical ability, accuracy and speed than ‘‘inference and 
application". 
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The remaining item types should therefore constitute a useful test of non-verbal 
reasoning, i.e.: 


1. Classification; 

2. Series completion; 
3. Analogies; 

4. Matrices; 

5. Codes. 


These five basic item types can be seen to measure classification and relation- 
finding, and also the applications of the same to new information. Two basic types 
of classification are in use: ‘‘unlike’’ (‘‘find the odd one out’’) and ‘‘like’’ (e.g., 
given a set, choose another class member). Classification ability may also be tested 
in the coding problem, by varying the subtlety of the figure-coding system. 


It is apparent that one type of matrix problem is equivalent to a straightforward 
analogies test in square format. Thus two types of matrix subtests may be derived, 
those which use simple analogies (2 x 2 cells) and those which include a variety of 
completion systems (3 x 3 cells; e.g., addition of elements, Latin squares). 


A further problem to be resolved in the design of a non-verbal reasoning test is 
whether or not spatial ability should be measured. It must be accepted that by using 
figural materials which are transformed in various ways the test is likely to have at 
least a small spatial ability component. The question to be answered therefore 
becomes: ‘То what extent should success on the test be dependent upon the ability 
to create mental images of complex configurations, then to transform those images 
and/or to compare them with similar external configurations?"' 


In order to answer this question, it is necessary to consider the purpose of the 
proposed test. That is, if the test is being designed to measure non-verbal reasoning 
ability, the accurate use of reasoning processes should be both necessary and 
sufficient for success on any item. Whenever possible, the need for other processes 
should be avoided. 


It might be argued that a measure of spatial ability would be useful, since the 
proposed test will be used for school selection purposes and spatial ability tests 
appear to have predictive validity for mathematics and science courses (MacFarlane 
Smith, 1982). If this is the case, then it seems that a separate measure of spatial 
ability, in addition to that of reasoning ability, would be more appropriate. Scores 
obtained on a test which included items requiring high level reasoning and items 
requiring complex spatial discriminations or transformations would be difficult to 
interpret. It would not be possible to distinguish between a child who had good 
reasoning ability but poor spatial ability and a child with the reverse pattern. This 
would be a serious drawback if high reasoning ability was considered crucial for 
selection, but high spatial ability was not. 


Given the above discussion, it seems that items in a non-verbal reasoning test 
should minimise the involvement of spatial judgments in their solution. That is, in a 
multiple-choice format, the distinction between the correct choice and each of the 
distractors should not depend upon a difficult spatial discrimination. 


SUB-TYPE CLASSIFICATION OF ITEMS 


Once the types of items have been chosen which will constitute a new test, the 
test developer has the task of devising the individual items. Two main difficulties 
must be overcome. Firstly, the designer must ensure that the subtests (each 
containing items of one type) necessitate a wide range of techniques for their 
solution, not relying heavily on one or two operations. In the present context, this 


PAULINE SMITH 125 


could be exemplified by avoiding a series completion subtest which used ‘90° 
rotation’’ excessively or a matrix subtest which consisted mainly of Latin square 
patterns. Secondly, the designer must have some means of deciding what level of 
item complexity will be appropriate for the test’s intended purpose. In practice this 
is usually achieved by producing large numbers of items on the basis of tests which 
have been successfully used before, and then carrying out item trials on an appro- 
priate population. 


Given these two problems, it would seem sensible to attempt an analysis and 
categorisation of item content so as to facilitate a more systematic method of item 
writing. In order to do this in the present case, the research work which has been 
conducted into the structure of non-verbal reasoning items will be reviewed. 


Most of the recent research work in this context has been carried out by 
psychologists who have adopted the ‘‘cognitive components" approach to 
reasoning. As stated in an earlier section, this approach is task analytic in nature. 
Researchers attempt to analyse the information processing components which are 
involved in the performance of standard reasoning test items. Models are then 
formulated for each item type, specifying the item parameters which are thought to 
affect performance, and the way in which such parameters relate to item difficulty 
and/or solution times. These models are then tested and refined by empirical 
analysis, until an adequate model is produced of the processing components under- 
lying performance on each particular item type. Such models can then be used to 
investigate the sources of variation leading to individual differences in test 
performance; with the ultimate aim of reaching a better understanding of what is 
meant by ‘‘reasoning ability". 


It is the models which are produced by the above research methods which are 
potentially useful to the test developer. That is, if the parameters determining item 
difficulty can be identified, and the way in which they affect item difficulty can be 
evaluated, then these same parameters can be used to analyse and categorise new 
items. It should eventually be possible to produce a specification for writing items of 
a required difficulty level, in terms of the relevant item parameters. At present, the 
state of research knowledge does not allow such precise specifications to be made. 
Even so, it will be valuable to review those findings which are available, so as to 
enable an initial analysis and categorisation system to be developed. If a consistent 
relationship was then found between item difficulty and the categorised parameters, 
this would support the research findings upon which the system was based. 


Analyses of non-verbal reasoning tasks have been largely confined to geometric 
analogies. The most extensive study was made by Sternberg (1977). He used verbal, 
pictorial and geometric items with a form of subtraction methodology in order to 
determine which processing components are used in solving analogies and what 
strategies are used to combine and operate the components. He concluded that the 
same processing components were used in all three types of analogy. If the analogy 
structure was “А >В: C-D"', then the components were: encoding, inference 
(determining the relationship between A and B), mapping (determining the relation- 
ship between A and C), application (transforming C according to the inferred 
rule(s)), justification (an optional component, used when no D option appears 
absolutely correct) and response. 


Sternberg assumed that a simple additive (linear) model was appropriate, 
wherein solution time would equal the sum of the durations of each component 
multiplied by the number of times each was carried out. Equally, the proportion of 
errors was predicted to be the sum of the difficulty of each component multiplied by 
the number of times it was carried out. 
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It can be seen from the above description that the effect of a change in item 
structure would depend upon the duration or difficulty of the components which 
would be employed to process the changed structure. Sternberg found that in the 
case of geometric analogies, attribute comparison processes (inference, mapping 
and application) were particularly time-consuming. From this, it follows that 
solution time would increase as the number of transformations to be inferred (A to 
B) and applied (C to D) increased or as the number of elements to be mapped (A to 
C) increased. The main source of errors was also found to be in these components, 
so that error rates would increase similarly. 


Unfortunately, Sternberg’s study did not include a systematic analysis of how 
item structure related to solution time or error rate. Noting this omission, 
Mulholland et al. (1980) carried out an empirical analysis of how response measures 
were affected by the number of elements in the first analogy term and the number of 
transformations required (to each element and in total) to obtain the second term. 
Subjects were asked to judge the truth or falsity of several hundred experimental 
analogies, which were specially generated to involve one to three elements and one to 
three transformations in various combinations. 


The results showed that solution time increased as the number of elements 
increased and as the number of transformations increased. The increase was linear 
for the number of elements when the number of transformations was held constant 
and vice versa. In contrast, as each factor increased, there was an increment in the 
time increase per unit for the other factor. The error rates were found to be affected 
by the number of transformations but not directly by the number of elements. 
However, the two factors were found to interact. Error rates were higher for 
multiple transformations of a single element than for an equivalent number of single 
transformations of different elements. 


Mulholland et al. were able to explain their results in terms of the concept of 
**working memory’’. As described by Baddeley (1976) the latter is a subsystem of the 
limited capacity short-term memory. It is in working memory that processing of 
incoming information takes place, in combination with information retrieved from 
long-term memory. The capacity of this store is limited to approximately seven bits 
of information (Miller, 1956) in an adult, so that items requiring a greater memory 
load will have a far higher probability of information loss (and therefore error) than 
items which are solvable within this storage capacity. 


Mulholland et al. hypothesised that the non-additive increase in solution time 
found at high levels of item complexity was due to the time taken by processes 
designed to maintain and update the contents of working memory, as the latter 
approached its capacity limit. The high error rate for multiple transformations of a 
single element was interpreted in a similar way. That is, since there was no external 
source (i.e., answer option) against which to check intermediate stages in the item 
processing, it was necessary to store the results of each transformation in working 
memory. Also, when the transformations are being inferred, their ordering as well 
as their identity has to be stored. 


The role of working memory in the determination of item difficulty has been 
supported by process analyses of verbal and numerical reasoning tasks. Kotovsky 
and Simon (1973) found that working memory overload was the major source of 
errors in the solution of letter series completion by adults. Holzman et al. (1983) 
confirmed this finding with number series completion by both adults and 11-year- 
olds. Using the same subjects, Holzman et al. (1982) found that performance upon 
numerical analogies was also strongly related to the amount of information to be 
stored and manipulated in working memory. The two studies by Holzman et al. 
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incorporated independent assessment of working memory capacity (backward digit 
span). This was found to be significantly correlated with success on items hypothe- 
sised to involve increasing demands upon working memory, supporting the 
theoretical explanation being put forward. 


These two studies also found that the 11-year-olds were more adversely affected 
by larger demands upon working memory than were adults. The reason why 
children perform less well on tests of working memory is not clear (see Siegler and 
Richards, 1982). Whatever the reason for this apparently lower capacity, it is clear 
that working memory load may be a particularly important factor underlying item 
difficulty when children are being tested, as in the present case. It would therefore 
seem profitable to develop an index of working memory load, whenever possible, as 
part of the analysis and categorisation scheme. 


In addition to the working memory considerations, the verbal and numerical 
reasoning analyses of Kotovsky and Simon (1973) and Holzman et al. (1982, 1983) 
produced other results of relevance to non-verbal reasoning tasks. The analysis of 
series completion items identified the discovery of periodicity as an important stdge 
in item solution. The need to detect a relationship between non-adjacent elements 
was found to be associated with a lower success rate than that achieved when 
adjacent elements were related, particularly for the children. Hierarchical or 
complex relations were found to be more difficult for 11-year-olds than for adults, 
with average IQ children performing less well than high IQ children. This effect 
occurred despite the fact that the component operations of which the complex 
relations were composed were familiar to all subjects. Holzman et al. suggested that: 
**. . , The ability to put together familiar components іп new combinations to 
describe a pattern or reach a goal may be a critical aspect of intellectual maturity"' 
(1983, p. 616). 


This view is in agreement with Sternberg (1982) who argued that the ability to 
cope with novel or **nonentrenched" tasks may prove to be a crucial aspect of 
intelligent behaviour. 


Given the results outlined above, it would seem advantageous to distinguish 
series items with a periodicity of greater than one, and also any reasoning items 
which rely upon complex relationships for their solution. 


One further study which examined the relationship between non-verbal 
reasoning item parameters and difficulty levels was that by Whitely and Schneider 
(1981). This assessed performance upon geometric analogy items taken from the 
CAT (Thorndike and Hagen, 1974) in relation to various measures of item structure. 
The measures were chosen according to each of three mathematical models 
describing the relationship between item structure and item difficulty. The linear 
logistic latent trait model (Fischer, 1973) was used to examine the fit of experi- 
mentally-obtained data to the item structure measures. The three models were pro- 
gressively more complex: the first assessed items according to the number of ele- 
ments and the number of transformations; the second did likewise but distinguished 
between transformations which did or did not involve spatial displacement; the third 
model distinguished between seven distinct types of transformation. The more 
complex models were shown to produce a statistically better fit to the subject data, 
though the third model was only slightly better than the second. Parameter estimates 
from the second model appeared to show that increasing the number of displace- 
ment transformations increased difficulty but that increasing other transformations 
(size, shade, shape or number) had the reverse effect. 


Unfortunately, the method of data collection may have produced atypical data. 
That is, the items were presented individually for a fixed interval equal to the mean 
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time per item for a normal CAT administration. This may have produced a set of 
item difficulties which were primarily dependent upon response time. While Stern- 
berg (1977) noted that response time and error rates are highly correlated, it has been 
shown by Mulholland ef al. (1980) (see earlier discussion) that aspects of item struc- 
ture may differentially affect the two. 


Despite this drawback, Whiteley and Schneider’s study provides some evidence 
that different types of transformation may yield different values, particularly 
transformations which do or do not necessitate spatial displacements. As such, it 
would be advisable to distinguish between different types of transformation 
whenever the limits of a particular item categorisation system will allow. 


At present, no empirical studies of the relationship between item structure and 
difficulty have been reported for matrix items. However, two studies have attempted 
to categorise matrix items in terms of the types of transformations which they incor- 
porate. Jacobs and Vanderventer (1972) surveyed a wide range of reasoning items 
and eventually derived a set of 12 transformations which they claimed could be used 
to classify the vast majority of items. These included changes in size, shade or shape, 
reflections, rotations and addition of elements in different cells. 


Ward and Fitzpatrick (1973) were motivated to develop a categorisation scheme 
by the results of a pilot study of matrix completion. This seemed to show that item 
difficulty was related to the number of cell attributes and the complexity of opera- 
tions required; a result akin to those described earlier for analogies and series items. 
The scheme which they devised was quite complex, allowing the precise description 
of how the cells in a matrix were related. Several types of patterning were 
distinguished (e.g., **horizontal symmetry", “vertical array identities"), as well as 
types of cell combinations (e.g., ‘‘intersection of shape sets’’) and numerous types 
of transformations (e.g., ‘‘seriation of growth", ‘‘rotation’’). 


Whether or not these precise categories bear any differential relation to item 
difficulty is not yet known; it may be that more generic categories will suffice. Raven 
(1960) grouped his Standard Progressive Matrices into five categories, described as 
involving ‘‘continuous patterns", “analogies of figures”, ‘“‘progressive develop- 
ment of figures" and ‘‘resolving figures into their constituent parts". Such a 
categorisation, in combination with a measure of the number of element 
transformations (**working memory load’’) involved in a matrix item, may prove to be 
appropriate for research purposes. However, a more precise specification of item types 
may be useful for ensuring that a set of items being generated utilises an adequate range 
of processes. 


In summary, the cognitive components literature suggests that it is possible to 
develop a system for categorising non-verbal reasoning items in terms of item para- 
meters possibly correlated with item difficulty. While the details of any particular 
categorisation system will depend upon the practical limitations (e.g., number of 
parameters allowed), it may be useful to summarise here the recommendations 
which can be derived from this review. 


(1) There should be an index of the number of cognitive operations which the 
testee has to perform when carrying out the task. 


(2) Items depending upon complex or unusual relations should be distinguished 
from those which do not. 


(3) Series completion items incorporating periodicity should be distinguished 
from those which do not. 


(4) Whenever possible, different types of transformation should be dis- 
tinguished, particularly those which do or do not involve spatial dis- 
placement. 
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The utility of these recommendations can be illustrated by considering the ‘‘odd one 
out" type of classification item (see Appendix 1). These items do not involve any 
transformations which could be enumerated in the way that analogies and series 
items do. However, if several of these items are examined, it can be seen that they 
differ in the number of attributes which are not constant across all the figures in an 
item. That is, easier items contain five figures which are the same as each other in all 
but one attribute, which distinguishes one figure from the other four. The more 
difficult items use a set of figures which vary in several attributes, only one of which 
distinguishes an ‘‘odd one out’’. The testee has to identify and examine each of these 
attributes until a solution is found. For this reason, the number of non-constant 
attributes may be used as a similar index to the ‘‘number of operations’’ measure 
which would be appropriate for analogy or series item types. 


Two more of the recommendations listed above can be combined to produce a 
second categorisation of these items. Firstly, items which involve a complex concept 
such as topological equivalence or symmetry can be distinguished from those which 
rely upon a basic attribute such as shape, shade or size. Secondly, within the ‘‘basic 
attribute" category, those items which depend upon spatial orientation can be 
distinguished from those which do not (see Figure 3). In this way, ‘‘odd one out’’ 
items may be classified by a two digit code, one of which indicates the number of 
variable attributes in the five figures and the second which takes one of three values, 


FIGURE 3 
CATEGORIES ОЕ “Орр ONE OUT"' ITEMS 
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indicating the use of simple non-spatial, simple spatial or complex concepts. Classi- 
fication systems may be developed in a similar way for other item types, by adapting 
the recommendations derived in this paper to specific cases. 


The sub-type coding systems which have been derived in the above way can be 
assessed in two ways. The practical utility can be judged by the extent to which item 
generation is facilitated and rationalised by adoption of the system. The system will 
be shown to be of theoretical value if item difficulty is found to be systematically 
related to the sub-type code values; that is, if the coding system identifies the corre-, 
lates of item difficulty. If this is the case, then such information can be used at the 
practical level, facilitating the generation of items at any required level of difficulty. 
Subsequent refinements of the coding system can be made on the basis of these 
assessments. 


CONCLUSION 


In conclusion, the purpose of this review has been to show the utility of 
analysing ability test items in terms of their information processing demands. The 
use of this type of technique would mean that ability tests are more rational in the 
processes they measure and more varied in their content than has previously been the 
case. That is, it will provide a theoretical basis for judging if item types or items 
within types are necessitating use of the intended processes. Additionally, the use of 
asystem for generating trial items with varied content (i.e., shapes, transformations, 
relationships) and required difficulty levels should reduce the number of unsatis- 
factory items in a trial test. Thus, a wider choice of items will be available for final 
test forms, giving assessments which are likely to be less repetitive and less tedious to 
complete. Finally, such a system will also facilitate the production of parallel forms, 
by allowing explicit matching of process parameters, rather than relying upon 
statistical and visual equating. 
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APPENDIX 1 
EXAMPLES OF PorENrIAL ITEM Types (abbreviated instructions) 


1. Classification: 


(а) “‘Unlike’’/“‘Odd One Out" (see Figure 4) 
One of the five figures is unlike the other four. Draw a circle around the letter under it. 


FIGURE 4 
CLASSIFICATION — ‘‘UNLIKE”’ 
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(b) “Like” (see Figure 5) 
The three figures on the left are alike in some way. Choose one of the five figures on the 
right which is most like the first three. 


FIGURE 5 
CLASSIFICATION — “ИКЕ” 
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2. Series Completion (see Figure 6) 
The five squares on the left are arranged in order. Choose which of the five figures on the right 
should go in the empty square. 


FIGURE 6 
SERIES COMPLETION 


О 
9 он А о] | 
а b с d e 


3. Series Ordering (see Figure 7) 
The five figures could be arranged in order. Think of them arranged in order and choose the 
one which would be in the middle. 


FIGURE 7 
SERIES ORDERING 


ерта 
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4. Digit Symbol Pairing (see Figure 8) 
Write the correct number under each symbol, as quickly as possible. 


ES ELM 
p CODE 


5. Matrix Completion (see Figure 9) 


One of the nine small squares in the big square is empty. Choose which of the five squares on 
the right should fill the space. 


FIGURE 8 
DiGiT SYMBOL PAIRING 


FIGURE 9 
MATRIX COMPLETION 


a b c d e 
6. Form Reasoning (see Figure 10) 


The answers to the first four lines are given. Use them to work out the answer to the fifth line. 


FIGURE 10 
Form REASONING 





Given that - 


|| 


| | 


О 04 
|| 
Ка DOJO 


Then, 


© 
^ 
Ci 
© 
AV О L 
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7. Locations/Circle Reasoning (see Figure 11) 
On each row, one circle has been blackened according to a rule. Decide what the rule is and 
blacken one circle on the fifth row according to that rule. ' 


O-O: -O x9 


O 


8 Codes/''Secret Languages” 


O 
O 


O 


FIGURE 11 
LOCATIONS 


O 


O 


Ce NE 


O 


(see Figure 


О 


О 
О 


12) 


— e 
О 


ОООО 


The figures on the left each have а code. Decide what the code letters mean and then choose 


the correct code for the figure in the middle. 


9. Analogies (see Figure 13) 


FIGURE 12 
CopES 


On the left are two figures which go together to make a pair. Then there is a third figure and a 
choice of five more. Choose which of the five makes a pair with the third figure, like the first 


pair. 


FIGURE 13 
ANALOGIES 


Ө-ө:М-А v AAV 
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APPENDIX 2 
ComPosITION OF PUBLISHED Non- VERBAL REASONING TESTS 
I Tests Appropriate for Eleven-year-olds: 


Non-Verbal Test 1 (Jenkins, J. W.; 1947-59) Ages 10 to 12-11 years. 
Subtests: 
I Classes — ‘‘Like’’ (given 3, choose 1 from 5); 
II Series Completion — insert or continue; 
III Classes — ‘‘Unlike’’; 
IV Ordering — middle of series; 
V Matrices — 2 x 2. 


Non-Verbal Test 2 (Lee, D. M., and Jenkins, J. W.; 1948-51) Ages 10 to 12-11 years. 
Subtests: 

Ito V Asis N-V Test 1. 

Non- Verbal Test 3 (Calvert, B.; 1953-58) Ages 10 to 15 years also called Non- Verbal test DH. 
96 items — mixture of 75 matrices + 21 series completion (continue), includes numerical, 
figural and continuous pattern material. Matrices — continuous pattern, 2 х 2, 3 x 3, 
4 X 4. 


Non-Verbal Test 4 (NFER; 1951-60) Ages 12 to 13-11 years. 
4 series of 3 types of item: 

(i): figural analogies; 

(ii) letter/number series completion — continue/insert; 
(iii) classes — ‘‘unlike’’. 


Non-Verbal Test 5 (Pidgeon, D. A.; 1953-65) Ages 8 to 11 years also called Non-Verbal Test 
BD. 


Subtests: 
1. Cypher — Symbol/Digit pairing, from 3 x 3 number grid; 
2. Similarities — (Classes — ‘‘Like’’; given 3, choose 2 from 7); 


3. Analogies — figural; 
4. Series — completion — insert or continue. 


АНІХ and AHIY (Heim, A. W., Watts, К. P., and Simmonds, V.; 1977) Ages 7-11 years. 
Subtests (X & Y parallel forms): 

S Series completion — continue; 

L “Likes” ( — given 2, choose 1 from 6); 

A Analogies; 

D ''Differents'' (Classes ‘‘Unlike’’ — Given 2, choose 1 from 6 which is ‘‘unlike’’). 

Uses 50% pictorial and 50% figural material. 


tee and AH3 (Heim, A. W., Watts, K. P., and Simmonds, V.; 1974-8) Ages 10 years to 
adult. E 
3 sections — verbal, numerical and pictorial. 
Pictorial section — ''spiral omnibus" of: 
Likes 
Analogies as in AHIX and Y. 
Differents (Alternate pictorial/figural material). 
Series 
AHA4 (Heim, A. W.; 1968-1975) Ages 10 years to adult. 
2 parts, verbal/numerical and **diagrammatic". 
Part II: **spiral omnibus” of: 
(i) Figural analogies; 
(ii) Figural identity (‘is the same as"); 
(iii) Figural ‘‘subtraction’’; 
(iv) Figural series completion — continue; 
(у) Superimposition of figures (“А on B gives outline . . .’’). 
All multiple choice 1 from 5. 


Otis-Lennon School Ability Test (Otis, A. S., and Lennon, R. T.; 1967-79). 
Intermediate Level. Ages 10 to 15 years. 
‘Spiral omnibus” form — mixed verbal, numerical and figural. 
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Non-verbal material: 

(i) figural analogies; 

(ii) series completion — insert and continue — figural, letters, numbers; 
(iii) matrix completion — 2 x 2, 3 x 3 — figural, letters, numbers. 


Culture Fair Intelligence Test (Cattell, R. B., and Cattell, A. K. S.; 1949-63) Scale 2: Ages 8 
years to adult 

Subtests: 

1. Series completion — continue; 

2. Classes — ‘‘Unlike’’; 

3. Matrices — 2 х 2,3 x 3; 

4. Conditions — Topology (given a figural arrangement, choose 1 from 5, as equivalent). 
All items figural 


Standard Progressive Matrices (Raven, J. C.; 1958) Ages 6 years to adult. 
5 sets of Matrices: 

A. Continuous patterns; 

B. 2 x 2 “analogy” form; 

C. 3 x 3 “progressive development of figures’’; 

D. Patterns and permutations; 

E. Combination of cells, or ‘‘resolving matrices into constituent parts’’. 


Children’s Abilities Scales (Childs, R.; 1982) Ages 11 to 12-6 years. 

6 tests in pairs, yielding 3 scales — verbal, spatial, non-verbal. 

Non-verbal tests. 

7. “Symbols?” — matrices 2 x 2,2 х 3, 3 x 3 — Progressions, patterns. 
8. “Shapes” — matrices 2 х 2, 2 x 3, 3 x 3 — Rotations, reflections. 
All items figural. 


Cognitive Abilities Test (Thorndike, R. L., and Hagan, E.; 1973) Ages 8 to adult. Multi-level 
test (8 levels) Level C — ages 10-12 years. 10 tests arranged in 3 batteries — verbal, 
quantitative, non-verbal. 
Non-verbal: 

(i) Figure classification (‘‘Like’’ — given 3, choose 1 from 5): 

(ii) Figure analysis (Analogies); 
(iii) Figure synthesis (given ‘‘pieces’’, determine which ‘‘shapes’’ they could have made). 


Differential Aptitude Tests (Bennett, G. K., Seashore, H. G., and Wesman, A. G.; 1947-62) 
Ages 11 years to adult. 
8 tests, including ‘‘Abstract Reasoning’’ — series completion — continue. 


П Other Tests and Test Collections surveyed: 
Kit of Factor-Referenced Cognitive Tests (Ekstrom, R. B., French, J. W., and Harman, H. 
Н.; 1976). 


Induction Factor: 
(i) Letter sets — ‘‘Unlike’’ (1 set from 5 sets); 
(1) Locations — (marks on rows of places/gaps, according to rule); 
(üi) Figure Classification — (given 2 or 3 sets, assign figures). 
Logical Reasoning Factor: 
(iv) Deciphering languages — (given coding examples, code new material). 


“Structure of Intellect” Model Tests (Guilford, J. P.; 1967). 
Tests of Reasoning Ability: 
(i) Figure Classification — (Assigning given figures to given sets); 
(ii) Letter Triangle — (2 dimensional letter series); 
(ili) Figure Matching — (Classes — ‘‘Like’’, given 1 figure, choose most similar from 5); 
(iv) Figure Exclusion — (Classes — ‘‘Unhke’’); 
(v) Matrices — 3 х 3, several empty cells; 
(vi) Figure Analogies; 
(vii) Secret Writing — (Coding problem, order of code letters not constant); 
(viii) Form Reasoning — (Figural *'equations); 
(ix) Circle Reasoning (circles — dashes in rows, one circle blacked, according to rule). 
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Biocultural Test of Non-Verbal reasoning (Toronto, A. S.; 1977) 
Subtests: 
(i) Differences (Classes — ‘‘Unlike’’); 
(ii) Similarities (Classes — ‘‘Like’’, choose most similar pair from 4); 
(ii) Analogies. 
Mixture of pictorial and figural material. 
Educational Abilities Scales (Stillman, A., and Whetton, C.; 1982) Ages 13 to 15 years. 
Symbolic Reasoning Scale: 
“Codes” Test — (given letters to figures coding system, give code for new figure). 


Br. J. educ. Psychol., 56, 138-149, 1986 


THE STRUCTURE OF ABILITIES IN VISUALISING THE 
ROTATION OF THREE-DIMENSIONAL STRUCTURES 
PRESENTED AS MODELS AND DIAGRAMS 


By С. M. SEDDON aw» К. С. MOORE 
(University of East Anglia, Norwich) 


Summary. The aim of this investigation was to determine the structure of abilities under- 
lying the visualisation of rotations about the three Cartesian axes for three-dimensional 
structures presented as models, on the one hand, and diagrams, on the other. The investi- 
gation was carried out by administering a test as two versions which required students to 
visualise the same rotations of the same structures presented as models and diagrams 
respectively. There were no factors common to the rotations with models and diagrams. 
The resulting factor pattern for models had two factors, one characterising rotations 
about both the X- and Y-axes, and one characterising rotations about the Z-axis. The 
pattern for diagrams had three factors, one for rotations about each Cartesian axis. 
"Theoretical and educational implications are discussed. 


INTRODUCTION 


Ат all educational levels, diagrams and models are used as teaching aids to represent 
three-dimensional objects or structures. Moreover, the diagrams are drawn, and the 
models are studied, from a variety of different viewpoints. It is therefore essential 
that students should be able to recognise that diagrams drawn from different view- 
points can represent the same structure, and that different views of the same model 
can all represent the same structure. Alternatively, the tasks may be described as 
visualising how diagrams of the same structure should be drawn, or how the models 
should be orientated to represent the effects of rotating the structure. Expressed in 
this alternative way, the tasks are seen to be fundamental to many aspects of science, 
where such skill is essential to the mastery of important spatial concepts. 


Most of the previous research on these tasks has concerned the visualisation of 
rotations with reference to diagrams. Morris and Hampson (1983, pp. 186-192) have 
discussed the general psychological issues involved, while Eliot and Hauptman 
(1981) have reviewed research with particular reference to science education. 
Overall, it would seem that the general standard of performing this task is unsatis- 
factory. Even high ability science students at sixth form and university level have 
great difficulties (Seddon, Tariq and Santos Veiga, 1982). Therefore, in order to 
rectify this problem, it is necessary to acquire a better understanding of the psycho- 
logical nature of the task. 


From a theoretical point of view, it would seem reasonable to suggest that the 
ability to visualise rotations of a structure from diagrams is a compound of two 
components. One component is the visualisation of the effects of rotating a struc- 
ture, when it appears in three-dimensional terms. The second component is under- 
standing how the structure is represented diagrammatically. If this model of the 
overall process is correct, the task of visualising rotations from models should have 
an underlying ability which is also required in visualising the same rotation in the 
corresponding diagram. There should also be an additional ability involved in 
visualising rotations from diagrams. 


As yet the only relevant empirical evidence is found in an investigation by 
Shepard and Metzler (1971) in which no significant difference was found in the 
speed of visualising rotations based on diagrams and models. Thus Metzler and 
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Shepard (1974) suggest that, in visualising rotations in relation to diagrams, subjects 
operate on visual images which are more like the three-dimensional objects them- 
selves rather than the diagram, and that the process of mentally rotating these 
internal representations is analogous to the physical rotation of the real objects. 
Another possible implication of this finding is that a common underlying ability is 
involved in both tasks. 


In order to throw more light on this issue, the first aim of this investigation is to 
examine the structure of abilities underlying the visualisation of rotations in 
diagrams and models. More specifically, it is intended to determine which of the 
three factor patterns in Table 1 actually represents the structure of abilities involved 
in these two tasks. 


TABLE 1 


THREE PATTERNS OF SIGNIFICANT LOADINGS WHICH 
COULD ARISE FROM TASKS INVOLVING MENTAL 
ROTATION OF THREE- DIMENSIONAL STRUCTURES 
PRESENTED AS MODELS AND DIAGRAMS 








Format for Factor 
presenting 
structure 1 2 
Pattern A 
Diagrams + 
Models + 
Pattern B 
Diagrams + + 
Models + 
Pattern C 
Diagrams + 
Models + 
FIGURE 1 
EXAMPLES OF THE FOUR DEPTH CUES USED IN THE DIAGRAMS 
Relative size 
cue 
Overlap cue 
Foreshortened 
line 
Distortlon 
of angles 


(In the actual model being represented all the spheres are the same size; 
all the Pas are the same length, and all the angles between adjacent rods 
are 90°). 
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As outlined by Gibson (1966) in theoretical terms, the task of understanding 
how structures are represented diagrammatically requires students to understand the 
significance of the cues which are deliberately manipulated by the artist to portray 
depth relationships. For example, Figure 1 shows that the impression of depth is 
created through manipulating the relative sizes of the different circles, the extent of 
the overlapping circles by lines, the foreshortening of lines, and the distortion of 
angles. Moreover, the ability to visualise rotations from diagrams has been shown 
empirically to be dependent on the ability to respond correctly to these four depth 
cues (Seddon, Eniaiyeju and Jusoh, 1984; Seddon and Eniaiyeju, 1985). 


FIGURE 2 


THE EFFECTS OF ROTATING A THREE- DIMENSIONAL STRUCTURE ABOUT THE THREE CARTESIAN 
AXES 


Onginal orientation 


Rotation about Rotation about Rotation about 
X-axis Y-axis Z-axis 


Figure 2 shows that the effect of rotating about the X- and Y-axes is to produce 
changes in the configurations in each of these four depth cues. As a result, the 
profile of the diagram drawn after rotation is not superimposable on that of the 
diagram drawn after rotation. In contrast, rotation about the Z-axis results in 
virtually no such changes, and the effects of rotating the structure through a particu- 
lar angle are represented simply by rotating the diagram itself through the same 
angle in the plane of the paper. Thus the visualisation'of rotating about the X- and 


G. M. SEDDON AND R. G. MOORE 141 


Y-axes requires students to know how the configuration of the depth cues should 
change. On the other hand, visualising rotations about the Z-axis can be accom- 
plished, in theory, without understanding the significance of the depth cues at all. 


As regards empirical evidence concerning the distinction between the psycho- 
logical properties of visualising rotations about the X- or Y-axes, on the one hand, 
and the Z-axis, on the other, successful attempts to teach students to visualise rota- 
tions about the X- and Y-axes had no effect on the visualisation of rotation about 
the Z-axis, and vice versa (Seddon, Tariq and Santos Veiga, 1984). Also, factor 
analytic studies have always produced a factor for visualising rotations about the 
Z-axis which is different from those observed for rotations about the X- and Y-axes 
(Seddon, Tariq and Santos Veiga, 1982; Seddon, Eniaiyeju and Chia, 1985). Thus 
these results are in line with the formal distinction which can be made in terms of the 
depth cues for rotations about the Z-axis. 


On comparing the tasks of visualising rotations about the X- and Y-axes, the 
fact that students must understand how depth cues change in each case suggests that 
there may be an underlying ability common to visualising both types of rotation. In 
fact, the empirical evidence reveals different results from different investigations. 
For example, Seddon, Tariq and Santos Veiga (1984) found that there was no 
transfer between the ability to visualise rotations about these two axes, when the 
students were given instruction in visualising rotation about only one. The factor 
analytic investigations of Seddon, Tariq and Santos Veiga (1982), as well as of 
Seddon, Eniaiyeju and Chia (1985) variously revealed separate factors for the X- 
a Y-axes with some samples of students, but a factor common to both axes with 
others. 


Thus, on moving from one sample of students to another, it is possible that 
there are different types of cognitive functioning in the way that students visualise 
rotations about these two axes. Moreover, bearing in mind that these different 
results were obtained with students from different countries (i.e., Pakistan, Cape 
Verde, Portugal, England, Singapore), it is possible that cultural effects are 
responsible. 


For visualisations of rotations based upon three-dimensional models of the 
structures, it is not easy to predict what differences and similarities there may be in 
the abilities required for rotations about the three axes. Since there are no symbolic 
depth cues involved, they cannot create any differences between the abilities 
required for rotations about the various axes. There may therefore be one ability 
which underlies rotations about all three axes. However, whereas rotations about 
the X- and Y-axes produce changes in the profile of the structure as projected on to 
the eye of the observer, rotations about the Z-axis produce no such changes. This 
formal difference may therefore separate the abilities involved in visualising rota- 
tions about the X- and Y-axes from those required for rotations about the Z-axis. By 
the same token, this formal similarity between rotations about the X- and Y-axes 
would imply that rotations about these two axes have a common underlying ability. 


Unfortunately there is no empirical evidence to throw further light on this issue. 
Therefore, the second aim of this investigation is to determine and compare the 
structures of the abilities involved in visualising rotations with reference to both 
diagrams and models of the same three-dimensional structures. More specifically, 
bearing in mind the results of the previous work on diagrams, it is intended to deter- 
mine which of the three factor patterns schematically represented in Table 2 applies 
to visualising rotations from models and from diagrams. 
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TABLE 2 


THREE PATTERNS OF SIGNIFICANT LOADINGS WHICH 
COULD ARISE FROM TASKS INVOLVING MENTAL 
ROTATION OF THREE-DIMENSIONAL MODELS ABOUT 
THE CARTESIAN AXES 








Axis for rotation Factor 
1 2 3 

Pattern A 

X t 

Y + 

Z + 
Pattern B 

X * 

Y + 

7 + 
Pattern С 

х + 

Y t 

Z t 

METHOD 


The basic plan was to construct two tests which tested the ability to visualise 
rotations of three-dimensional structures about the three Cartesian axes. The tests 
were identical in all respects apart from the fact that in one (The Diagrams Test) the 
rotations were visualised from diagrams, and in the other (The Models Test) the 
rotations were visualised from models. Both tests were to be administered to the 
same students and, in order to control for possible sequence effects, half the 
students were to take the Diagrams Test first, and half the Models Test first. The 
results of all the tests were then to be subjected to factor analysis to determine the 
overall structure of abilities for both types of test across all three axes of rotation. 


Subiects 

The students (N = 266) comprised all the students in the age group 14-15 years 
from two all-ability mixed comprehensive schools in Norwich. The sexes were 
almost equally represented. 


The tests 

These were criterion-referenced tests based, with modifications, on that 
originally developed by Seddon, Tariq and Santos Veiga (1982). The modifications 
took the form of a simpler and more suitable wording of the questions for these 
younger students. 


All of the items were of the type illustrated in Figure 3. Thus the student was 
required to look at a diagram or model of a three-dimensional structure, and select 
which one of four other diagrams or models, respectively, could represent the 
structure, if the structure were rotated about a specified Cartesian axis. 


The structures were similar to those represented in Figure 3, and each had a 
central ball connected by rods to between two and six surrounding balls. All the balls 
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FIGURE 3 
THE INSTRUCTIONS AND EXAMPLES OF THE QUESTIONS USED IN THE DIAGRAMS TEST 


Turning around the X axis Turning around the Y axis Turning around the Z axis 


cT 
8 T QE 
NJ T- 
C 


Y axis Y axis 


Note: The X axis and the Y axis are 
flat on the page. The Z axis is 
coming out of the page towards 
you. 


For each question, pick the drawing 2, 3, 4 or 5 which shows what 
structure 1 would look like if you turned It around the axis indicated. 


жр qu 


1 Turn Structure 1 around the Y axis 


vow fl ome ok 


5 








2 Turn Structure 1 around the Z axis 


foo fom 4 


1 2 
3 Turn Structure 1 around the X axis 





4 5 
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were of the same diameter, and the connecting rods of the same length. The 
complete set of configurations used in the test is illustrated in Figure 4. To a chemist, 
these diagrams represent the three-dimensional structures of molecules. However, it 
is by 2 means necessary to appreciate this point in order to answer the questions 
correctly. 


FIGURE 4 
'THE COMPLETE SET OF STRUCTURES USED IN THE TESTS 


+ Ф 
or 04-0 
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The orientation used for presenting a diagram or model 1 was chosen 
arbitrarily. In presenting diagrams or models 2-5, three orientations were derived by 
rotating the structure from its initial orientation about the X-, Y-, and Z-axes 
respectively. The fourth orientation was obtained by rotating the structure about a 
randomly chosen pair of these axes in sequence. The order for presenting diagrams 
or models 2-5 in each question was determined independently for each item on a 
random basis for each test. 


The 30 questions were presented in a different random order for the Diagrams 
and Models Tests. 


The Diagrams Test. The diagrams were drawn using a computer programme 
specially written to provide diagrams which are accurate in terms of perspective. In 
answering each question, the student was always able to refer to diagrams and a note 
on the inside cover of the test booklet indicating the orientation of the three axes. As 
shown in Figure 3, these diagrams also incorporated arrows, which were intended to 
explain the precise nature of the rotations to be visualised. 


Before the students began the test, the supervisor read out the simple 
procedural instructions, which also emphasised that the diagrams represented three- 
dimensional structures. Then, after the supervisor had demonstrated the orientation 
of each axis using a large model of labelled Cartesian axes, the students answered the 
questions under normal examination conditions, using a specially prepared answer 
card, There was no time limit, but all students finished in less than one hour. 


The Models Test. The models were made from plastic balls 6-9 mm in diameter, 
and stainless steel rods 1 mm in diameter. The distance between the circumferances 
of the connected balls was 13 mm. All the models for a particular question were 
fixed in the required orientations on a shelf. The shelves were mounted in turn ona 
vertical display board at a height at which a student of average height would have a 
horizontal line of vision. Taller students were required to bend slightly in order to 
achieve this viewpoint, and for shorter students this viewpoint was achieved with the 
aid of a stool on which they stood. The questions were printed in large letters, 
directly above each set of models, and fitted to the top of the display boards was a 
large labelled model of the Cartesian axes, to which the students could always refer 
to establish the orientation of the X-, Y-, and Z-axes. 


In administering the test, each student was given an individual verbal explana- 
tion of the procedure using a practice item. The student then worked along the line 
of shelves answering the questions using a specially prepared answer sheet. In fact, 
the students required 15-25 minutes to complete the whole test. 


Administration 

With three complete sets of models in operation simultaneously, it required a 
total period of three weeks to administer the Models Test to every student in each 
school using convenient slots in the timetable. The Diagrams Test was administered 
in normal classes over a period of one week. 


The factor analysis 

The overall approach was to extract principal components from the scores of 
subsets of items corresponding to each of the different types of item. These com- 
ponents were then to be rotated to the Varimax criterion. 


In all the factor patterns which might be anticipated in the light of the 
theoretical discussion, the number of factors which would be significant varies from 
two to six. This latter number would arise, for example, if the Diagrams Test and 
Models Test each yielded three different factors corresponding to visualising rota- 
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tions about a different axis. Therefore, in order not to prevent such a factor pattern 
from being revealed, it is essential to examine the solution extracting six 
components, irrespective of what solution may be preferred using rules of thumb, 
such as the Kaiser and Scree tests. 


While the observation of six significant factors could imply that there are 
different abilities underlying each type of item, the same result could equally well 
arise from six sets of items where the scores had been allocated to the students purely 
on a random basis. In such circumstances, it is therefore important to guard against 
wrongly concluding that different sets of items measure different things, when in 
fact they measure nothing at all. In practice, this was achieved by randomly 
allocating the items for each test and each axis of rotation to two subsets, a and b, 
and performing the analysis on the total of 12 sub-scores for each student. Then, if 
the items corresponding to the same test and axis of rotation are measuring the same 
thing, one of the resulting six factors will be loaded on both corresponding sets of 
items. In contrast, if the scores are just a random collection of numbers, the 
loadings will not be arranged in such a way that two subsets of items corresponding 
to the same test and the same axis of rotation will have significant loadings on the 
same factor. 





RESULTS 
TABLE 3 
SUMMARY OF THE SCORES OBTAINED ON THE TESTS 

Test Axis of rotation 
X Y Z 
Diagrams X 4:5 6:0 5-1 
5 2:3 2-2 2:2 
Models X 6-8 7-1 5:6 
$ 2-3 2:3 2:4 





Table 3 summarises the scores on each of the two tests for the items concerned 
with each of the three axes. On subjecting these scores to an analysis of variance, 
involving those design factors on which repeated measures were made for the factors 
test mode and axis of rotation, the two main effects and their first order interaction 
were found to be significant at the 0-001 level, no matter which test was adminis- 
tered first. On examining these effects in more detail, the significant difference due 
to test mode was found to occur for rotations about the X- and Y-axes, but not 
about the Z-axis. 


In the absence of any information on the test-retest reliability of each of the 12 
subgroups of items, the factor analyses were performed with values of unity in the 
diagonal of the correlation matrix. On examining all solutions obtaining by 
extracting one to six principal components, it is seen that only five factors are 
significant. The results of the five-factor solution, which accounts for 72:7 per cent 
of the total variance, are presented in Table 4. The asterisks indicate those loadings 
which exceed 0-3, i.e., the critical value which is conventionally used to determine 
significance. 

Bearing in mind that the items for a particular axis and a particular test were 


allocated to two subsets at random, these two subsets are expected to agree as 
regards the significance or insignificance of their loadings on any one factor. In fact, 
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TABLE 4 
VARIMAX LOADINGS OBTAINED IN THE EvE racor SOLUTION FOR EACH OF THE TWELVE SUBSETS OF 
TEMS 
Factor 
Test Axis of Subset 
rotation of Items 1 2 3 4 5 
Diagrams a 0-20 0-32* 0-78* 0-11 0-10 
x b 0-20 ~0-06 0-74" 0:23 0-34* 
a 0-14 0-20 0-41* 0:18 0-68* 
Y b 0-24 017 O0 0-14  0-85* 
a 0-21 0-23 0-23 0-68* 0-09 
2 b 0-01 0-02 0-08 0-87* 0-17 
Models a 0-68* 0-29 0-24 0-14 0-10 
x b 0-51* 0-4* 0-33" — 0-26 003 
a 0-81* 0-16 0-00 0:07 0-29 
Y b 0-80* 0-15 — 021 0-03 0-11 
a 0:36* 0-70* 0-06 0:27 0-11 
2 b 0-21 0-81* 0-15 0-00 0-23 





* Significant loadings. 


Table 4 shows that, using 0-3 as the critical value for loadings, the vast majority of 
the pairs of such subsets do show agreement. There are only six cases of disagree- 
ment out of the 30 possible comparisons of this type. For example, factor 5 has a 
significant loading on one of the subsets concerned with rotations about the X-axis 
for the Diagrams Test, but not for the other subset. These disagreements can have 
е only as а result of sampling errors іп the random allocation of the items to the 
subsets. 


The direction of these errors can easily be discerned by noting that, where two 
subsets of items agree in having significant loadings, the magnitude of both loadings 
is always greater than 0:5, thereby exceeding the critical value of 0:3 by a con- 
siderable margin. In contrast, for all six cases of disagreement, the value of the 
significant loading is always less than 0-41, thereby only just achieving significance, 
and the value of the insignificant loading is well below the critical value. It therefore 
seems reasonable to regard these six cases of disagreement as being due to Type I 
error, and that in reality the items are not significantly loaded on the respective 
factors at all. 

TABLE 5 


SUMMARY OF THE PATTERN OF SIGNIFICANT 
LOADINGS FOR THE DIFFERENT AXES OF ROTATION 
AND DIFFERENT TESTS 








Test Axis of Factor 
rotation 1 2 3 4 5 
Diagrams X t 
Y T 
2, + 
Models + 


м< х 
+ 
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In Table 5 the results of this decision are implemented in summarising the 
pattern of significant loadings to be used in interpreting the factors. Thus each plus 
sign indicates only those cases where both subsets of items for the same axis of rota- 
tion and test had conventionally significant loadings on the same factor. 


DISCUSSION 


While the mean scores reported in Table 3 do not bear directly on the particular 
issues being investigated, it is interesting to have the mean scores of the Models Test 
in perspective. The first point to note is that the mean scores for each axis of rota- 
tion are considerably lower than the maximum score possible. For rotations about 
the Z-axis, there is actually no significant difference in the mean scores for the 
Models and Diagrams Tests. Thus, for all three Cartesian axes, these students 
obviously have difficulties in visualising rotations in relation to three-dimensional 
objects as well as diagrams. In the case of rotations about the X- and Y-axes, the 
task of visualising rotations is significantly easier in relation to models than in 
relation to diagrams. It may therefore be that this effect arises due to a lack of 
understanding of how the depth cues should change. 


Fortunately, Table 5 allows a very clear interpretation to be made of each of the 
five factors, since they fall into two main groups. Factors 1 and 2 concern models 
only, and factors 3, 4, 5 refer only to rotations involving diagrams. Thus rotations 
involving models, on the one hand, and diagrams, on the other, have no common 
factors whatsoever. Of those factors which refer only to models, factor 1 is 
associated with rotations about both the X- and Y-axes, whereas factor 2 refers only 
to rotations about the Z-axis. In contrast, the three factors concerning diagrams 
each refer to rotations about a different Cartesian axis. 


It therefore seems that the two types of task are psychologically distinct, and 
that the ability to visualise rotations in diagrams is not related to the ability to 
visualise rotations in models. The different conclusions drawn by Metzler and 
Shepard (1974) were based upon the results of an earlier experiment (Shepard and 
Metzler, 1971) which had been concerned with the speed of visualising rotations 
rather than the accuracy as measured in the present experiment. For this reason, it is 
difficult to relate the results of the present experiment with those of Shepard and 
Metzler, and those of other previous experiments (e.g., Cooper and Shepard, 1973; 
Cooper, 1975; Cooper and Podgorny, 1976). This lack of comparability seems even 
more likely in view of the fact that Egan (1976) has found no correlation between 
measures of accuracy and speed in performing mental rotations. 


In considering the educational implications, it must be borne in mind that the 
present results merely describe the status quo, and not what could, or should be. In 
fact, the observations may actually highlight the nature of the difficulties which 
students encounter in visualising rotations from diagrams, in the sense that the 
students are apparently not drawing a clear relationship between visualising 
rotations involving models, on the one hand, and diagrams on the other. If this be 
the case, the task of the teacher should be seen as leading the students to make this 
relationship. 


A step in this direction may have already been made by Seddon, Eniaiyeju and 
Jusoh (1984) in an investigation concerned with teaching the visualisation of rota- 
tions in diagrams to students very similar to those taking part in the present 
investigation. Thus Seddon, Eniaiyeju and Jusoh compared the effectiveness of 
three teaching methods which varied progressively in the extent to which students 
were required to relate the rotation of real three-dimensional models to the process 
of visualising the same rotations from diagrams. In the least effective method, the 
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students were given no three-dimensional models to rotate or compare with the 
diagrams. In a significantly more effective method, the students viewed models 
actually rotating, and compared their changing profiles with diagrams representing 
the structures before and after rotation. However, the most effective method was 
one where the students simultaneously viewed the rotating models and their shadows 
projected on to a screen behind the models. The superiority of this method was 
presumably due to the fact that the students could now see and make most easily the 
necessary relationships between the three- and two-dimensional representations. In 
the light of the findings of the present experiment, it would be interesting to see if 
the corresponding factor structures for visualising rotations from models and 
diagrams also changed as a result of this teaching. 


Finally, the results of the present investigation showed that the pattern of the 
factors concerned only with diagrams parallels exactly that which has been observed 
in the two previous factor analytic investigations with English students (i.e. Seddon, 
Tariq and Santos Veiga, 1982; Seddon, Eniaiyeju and Chia, 1985). Thus the result 
seem to be eminently reproducible. 


The pattern of loadings for rotations involving models parallels the formal 
difference described previously in terms of these rotations which produce a change 
in profile (i.e., rotations about the X- and Y-axes), and those rotations which do not 
(i.e., rotations about the Z-axis). Moreover, the fact that the pattern does not 
further separate the rotations about the X-axis from those about the Y-axis suggests 
that the phenomenon of a changing versus constant profile is a key issue. it is 
intriguing, although not easily explained, why rotations about the X- and Y-axes are 
separated in the case of diagrams but not models. 


Requests for reprints should be sent to Dr. G. M. Seddon, Chemical Education Section, 
School of Chemical Sciences, University of East Anglia, Norwich, England, NR47JT. 
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SPATIAL REPRESENTATIONS IN MATHEMATICALLY AND 
IN ARTISTICALLY GIFTED CHILDREN 


By B. HERMELIN anp М. O'CONNOR 
(MRC Development Psychology Proiect, Institute of Education, London) 


Summary. Mathematically and artistically gifted children and two IQ-matched control 
groups were compared for their abilities to carry out several types of visual-spatial tasks. 
One type of task was verbally presented and required a verbally formulated answer, while 
another was presented visually and the answer involved a choice of one of several visually 
presented alternatives. Two other types of problem were investigated, and these were 
respectively concerned with short-term memory for non-verbalisable shapes, and with the 
capacity to name objects or animals on the basis of minimal visual cues. Mathematically 
gifted children were better than all the other groups in solving the verbal-spatial 
problems. On the other hand, artistically gifted children were particularly able at con- 
structive imagination on the basis of minimal perceptual cues. Despite being of different 
IQ levels, both mathematically and artistically gifted children were equally good and were 
superior to IQ-matched controls in visual recognition memory. The results justify the 
inference that level of performance on some of the tasks tested here is to some degree IQ- 
independent, but is related to specific artistic or mathematical talent. 


INTRODUCTION 


Tue present study limits itself to the investigation of some specific aspects of spatial 
ability, and it has two aims. The first is to investigate which of the tested com- 
ponents of children's spatial ability may be dependent on, and related to, their 
general level of cognitive functioning. The second is to compare possible differences 
in the capacity to carry out certain spatial operations in children who are equally 
intelligent but are or are not specifically gifted either for mathematics or for the 
visual arts. 


The association of spatial ability with processes involved in mathematics as well 
as the visual arts, and its disassociation for instance from writing, is well attested. 
For example the novelist Aldous Huxley (1970) stated that although words mediated 
his thoughts they did not tend to evoke visual images in his mind. On the other hand 
McKim (1972) reports Einstein as stating that for him words, whether written or 
spoken, did not play any part in his thought processes. Instead, his thinking 
proceeded in terms of signs and images which he could reproduce and recombine at 
will. Thus it seems that an ability to think in words need not necessarily lead to 
visual images while thinking about space does not require verbal mediation. 


That painting, drawing and sculpture involve special sensitivity to the visual- 
spatial world seems self-evident. There are also accounts of some great artists which 
refer to particular aspects of their visual-spatial skills. Thus in 1550 Vasari (1957) 
related that Michael Angelo had an outstanding visual memory which allowed him 
to retain in every detail works of other artists which he had only seen once. He also 
remembered everything he himself had ever produced, and so never repeated 
himself. A different aspect of visual spatial ability which has some relevance to the 
present study seems to have been regarded as important by Leonardo da Vinci. He 
is reported by Bachelard (1964) to have urged his students to contemplate the cracks 
in the old Florentine walls, in order to consider what possible forms these might 
suggest. We took a similar approach in an experiment (O'Connor and Hermelin, 
1983) where we asked artistically gifted children to identify drawings from partial 
information. It was found that such children needed fewer visual cues than did IQ- 
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matched controls. Thus, this imaginative capacity to recognise forms on the basis of 
minimal information may be characteristic of talented artists. We have also investi- 
gated the recognition memory of artistically endowed children for non-verbalisable 
and for verbalisable shapes (O’Connor and Hermelin, 1983). It was found that 
children who were gifted in the visual arts recognised non-verbalisable shapes 
(Arabic letters) as well as they did verbalisable ones (Roman letters). This was in 
contrast to a control group matched for IQ, whose recognition memory was much 
better when the stimuli were codable in verbal form than when they were not. We 
concluded from these results that the artistically gifted were better able to rely on 
non-verbal, visual memory. 


As far as mathematically gifted children are concerned, Furneaux and Rees 
(1978) suggested the existence of a specific mathematical ability independent of 
general intelligence and Krutetskii (1976) emphasised the importance of the capacity 
of schoolchildren to interpret spatial problems in symbolic or verbal terms as 
contributing to mathematical skill. Investigating a number of children with out- 
standing mathematical talents, he provided many fascinating accounts of mental 
processing by these subjects. Amongst the components of mathematical skill which 
were isolated in these studies were the ability to grasp the basic structure of 
mathematical problems and to formalise the perception of mathematical material in 
terms of propositions. The capacity for logical thought in the sphere of quantitative 
and spatial relationships was also regarded as crucial as was the ability to form 
spatial concepts, and to visualise spatial relationships. 


Thus, it seems justified to assume that both artistically and mathematically 
gifted children may process some visual-spatial data more efficiently than do 
children of similar general intelligence but without these gifts. The question asked in 
the present context is which particular components of spatial ability these two 
groups of gifted children might share, and which spatial operations might 
distinguish them. 


METHOD 


In order to investigate this, two groups of children, one gifted in mathematics 
and the other in the visual arts, were each matched with control subjects on a stan- 
dardised intelligence test. The four groups of children were then compared on 
several operations. First, we asked whether gifted mathematicians had a specific 
ability to solve verbally presented spatial problems by recoding verbally stated 
propositions into spatial quantities. (See Appendix 1.) In order to control for levels 
of verbal ability per se in so far as the different groups were concerned, non-spatial 
verbal reasoning problems were also presented. Second, the children’s capacity 
mentally to manipulate visually presented spatial material was tested. Such tasks 
involved the mental rotation and re-orientation of shapes. Responses required the 
selection of one of several visually presented alternatives, and words were not 
involved. Immediate visual recognition memory for non-verbalisable visually 
presented shapes (Persian letters) was also included in this series. Finally, a task was 
presented in which subjects guessed the nature of a partially presented outline 
drawing of an object or animal. In previous experiments (O’Connor and Hermelin, 
1983), success in this test of constructive imagination was found to be characteristic 
of artistically gifted children. 


Subjects 

The children used in the following experiments were pupils from a compre- 
hensive, i.e., mixed ability school in Southern England. Two groups were selected by 
their teachers and by the headmaster, according to whether they fell into the top 2 
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per cent of gifted artists or mathematicians in the 12- to 14-year-old age group. In 
addition, all were tested on the English Progress Test, E2, of the National 
Foundation for Educational Research (1979) and matched on this test with control 
subjects who were not gifted for either mathematics or the visual arts. This 
procedure resulted in four groups with a mean age of 13 years 6 months. There were 
10 mathematically gifted children with a mean IQ of 117 (SD6) and an IQ-matched 
control group of 7 subjects. There were also 9 artistically gifted children with a mean 
IQ of 107 (SD6) and this group was matched with 8 controls on the same English 
Progress test, which measures verbal ability and scholastic attainment. 


RESULTS 


Verbally presented spatial problems 

The set of tasks in this condition aimed to test the subject’s ability to translate 
verbally formulated problems into spatial terms, manipulate spatial images and 
recode them into verbal responses. None of the 12 questions which were asked 
attempted to test specific acquired mathematical knowledge, but all required 
reasoning on the basis of spatial representations. The questions are presented in 
Appendix 1. 


The 12 problems proved to be of unequal difficulty. Thus, question 2 was 
answered correctly by only a few children while most subjects responded correctly to 
question 4. Overall, questions which referred to formal, two-dimensional geometry 
seemed easier than those referring to three-dimensional space or to less formal 
configurations like the hands of a clock or the rotation of a letter. However, as these 
relative levels of difficulty were similar for all four groups, the correct responses to 
the 12 questions were combined, resulting in one verbal-spatial score for each 
subject. The means of the groups for these scores are presented in Table 1. 


TABLE 1 
VERBALLY PRESENTED SPATIAL PROBLEMS 


Groups Mean Score SD 





Mathematical Children 8-30 1-83 
Mathematical Controls 6°85 
Artistic Children 5-90 1-17 
Artistic Controls 6-00 . 


However, homogeneity of variance was lacking in the scores and therefore these 
means are not representative of group performance and a non-parametric test 
(Mann Whitney U) was employed for analysis of the data. 


The results of this test showed that the mathematically able children differed 
from their IQ-matched controls near the Р = 0-05 level (U = 15 for Ns. of 7 and 10) 
whereas the Artistic and Artistic control groups did not differ from each other (U = 
32 for Ns. of 8 and 9). This result suggests that a factor other than intelligence is 
affecting the results. Significant differences were obtained as might have been 
expected between the Maths and Art groups (U = 15 for Ns. of 9 and 10; P < 0-02) 
and between the Maths and Art Control Groups (О = 12 for Ns. of 8 and 10; P < 
0-02). These latter two differences reflect the IQ differences between the groups. 
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Non-spatial verbal reasoning tasks 

In this investigation each child was presented orally with 16 verbal reasoning 
problems. Of these, four were in the form of Three Term Series Problems, e.g., 
*"The red car is faster than the blue car, the green car is faster than the red car. 
Which is the slowest?’’ Another set of four problems presented two sentences in 
different verbal forms, and the subject had to decide whether or not these sentences 
had the same meaning. An example of this type of semantic identity problem is, 
** John is more industrious than Peter. Peter is less lazy than John. Do these two 
sentences have the same meaning or different meanings?"' A further set of four 
problems was presented in the form of analogies, e.g., “А palm tree is to a flagpole 
as a snake is to a zoo, or a garden hose. Which?" Finally, there were four 
syllogisms, such as: ‘‘Italians like opera more than Germans do. Germans like opera 
more than the English. Therefore Italians like opera more than the English. Is this 
correct reasoning, or not?" 


In order to ascertain whether the number of correct responses to the four types 
of verbal reasoning problems (i.e., Three Term Series, Semantic Identity, Analogies 
and Syllogisms) differed from each other, an analysis of variance was carried out 
comparing these four sets of scores. This resulted in a non-significant F of 1-166. 
The scores for each subject were therefore combined, resulting in a possible range 
from 0-16 correct responses for each subject. Once again homogeneity of variance 
was lacking and a non-parametric test was used for analysis. 


A Mann-Whitney U Test was thought appropriate. Test results indicated that 
children with high scores on the NFER Test, i.e., mathematicians and their controls, 
tended to obtain high scores on verbal reasoning tasks. Thus mathematical children 
had higher scores than artistically able children (U = 12 for Ns. of 9 and 10; 
P « 0-02) and than the Art Control group although only at a very low significance 
level (U = 20 for Ns. of 8 and 9); P < 0-10) while the mathematicians did not differ 
from their matched control group. Thus these results reflect IQ levels. The means 
and standard deviations of the groups are set out in Table 2. 


TABLE 2 
NON-SPATIAL VERBAL REASONING TASK 


Groups Mean Score SD 





Mathematical Children 14-20 1-23 
Mathematical Controls 13-00 2-08 
Artistic Children 11:44 2-24 
Artistic Controls 12-13 2-80 


Visually presented spatial problems 

In this condition the material was visually presented, and the subject was asked 
to select one of several visually presented alternatives for a response. Thus, no 
verbal encoding was required in the tasks themselves apart from the general instruc- 
tion. The tasks which were included in this condition were: (a) to present the subject 
in turn with perspective drawings of a cube, an irregular figure, and three shapes, 
and ask him to select from each of the three presented alternatives the one which 
would result if each of the three displays were viewed in turn from one of four 
different directions; (b) presenting in turn four depicted shapes: a four-sided 
pyramid, a cone, a hemisphere and a cylinder, and requiring the subject to point to 
the one of the presented alternative drawings which would result if the shapes were 
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sectioned parallel to the base; (с) to present the subject simultaneously with four 
pairs of Persian letters for 10 seconds and then ask him to reconstruct the pairs from 
memory. Illustrations of these tasks are given in Figure 1. 


FIGURE 1 
EXAMPLES FOR VISUAL-SPATIAL TASKS 


(top = Rotation, middle = Sectioning, bottom = Memory) 





The scores from mental rotation problems and imagined sections (see Fig. 1) 
did not differ significantly from each other and were combined for analysis. 
Comparison of these combined scores was made by use of the Mann Whitney U 
Test. No significant differences between the groups emerged. This could well be 
because the tests proved too easy, resulting in scores for each group which were near 
ceiling level. 


A comparison of group performance on the memory task for paired associate 
shapes of Persian letters however did show some group differences. Mathematically 
and artistically able children did not differ from each other despite the IQ 
differences between them (U = 36 for Ns. of 9 and 10; P = NS). However, mathe- 
matical children differed from their 1Q-matched controls near the 0-05 level of 
significance (U = 16: Ns. of 7 and 10), and the artistically able children differed 
from their IQ-matched controls (U = 15 for Ns. of 8 and 9; P < 0-05). Thus the 
results in this task did not reflect IQ level but ability in both the mathematicians and 
the artists to remember non-verbalisable shapes. 
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Constructive imagination: identifying incomplete pictures 

In this task the subjects had to identify and name drawings of objects or 
animals on the basis of minimal necessary information. The child looked into a 
display box in which, against a black background, only a small portion of the 
drawing was lit up at the first trial. On subsequent trials progressively more of the 
outline was revealed until a correct identification was made. After an example, 
pictures of a boot, a bear, a goat and a banana were in turn presented in this 
manner. The score was the number of exposures needed, each in turn providing 
more details, which were necessary for identification. 


The numbers of trials needed for identifying each picture were added for the 
four displays. In this instance comparison of mean group scores by Mann Whitney 
U Test showed that the artistically able group was superior to the mathematically 
able group (U = 17 for Ns. of 9 and 10; P > 0-02 < 0-05) and to the Art Control 
Group (U = 12 for Ns. of 8 and 9; P > 0-02 < 0-05)). They also proved superior to 
the Mathematics Control Group (U = 7 for Ns. of 7 and 9; P < 0-02). Thus the 
artists proved better on this task than subjects of higher IQ. 


The distribution of mean scores and standard deviations for groups is given in 
Table 3. 


TABLE 3 
IDENTIFICATION OF INCOMPLETE PICTURES 





Mean No. of 
Groups Exposures SD 
. Mathematical Children 23:60 5:60 
Mathematical Controls 29-00 4-28 
Artistic Children 17-11 6°64 
Artistic Controls 24-88 6-81 
DISCUSSION 


The results from the present study can be summarised as follows: verbally 
presented tasks requiring spatial reasoning and the mental manipulation of spatial 
images, were better solved by the mathematically gifted than by either an IQ- 
matched control group, or by those gifted for the visual arts. This result thus 
confirms findings from the Soviet Union discussed by Krutetskii (1976), namely that 
those who are gifted for mathematics can convert verbal codes into spatial images, 
are able to operate on these images, and can then retranslate their solution into 
verbal form. However, it could be any one or all of these processes which distinguish 
the mathematicians in this experiment. 


As demonstrated in the results, more children in the two specifically gifted 
groups achieved high non-verbal visual-memory scores than did their control 
subjects. As these two gifted groups differed from each other in verbal intelligence 
but did perform equally well on the non-verbal memory test while their respective IQ- 
matched control groups were significantly less successful with the task, these results 
could be interpreted as suggesting some degree of IQ-independence. 


It was also found that artistically gifted children could identify incomplete 
pictures on the basis of less information than was required by any of the other 
groups. This finding confirmed a previous result (O'Connor and Hermelin, 1983) 
and might now be regarded as reliably established. It is of some interest, as the 
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ability to arrive at conclusions on the basis of insufficient information has 
previously been regarded as an indicator of high general intelligence. It could be 
assumed that with minimal cues an individual has to draw on information repre- 
sented in some internal ‘‘lexical’’ system which can be accessed to supplement 
incoming data. It might be argued that a marked facility with this operation could be 
a component feature of what is often vaguely referred to in the literature as 
‘creativity’. Of course, ability to access visual images does not necessarily imply 
ability to readily access other kinds of stored information. In any case, what seems 
to distinguish children gifted for the visual arts appears to be not so much an out- 
standing ability to process and mentally operate on incoming perceptual data, but 
rather the ready generation of stored visual images, which can be accessed when 
prompted by minimal perceptual cues. However, such an interpretation is specu- 
lative and the precise nature of this ability will have to be investigated further. 


The conclusions from this study therefore are that while non-spatial verbal 
reasoning is related to verbal IQ, the ability to deal with verbally presented spatial 
problems is not solely so determined. As the mathematically gifted group differed 
from the other three groups in so far as most of its members achieved very high 
scores on verbally presented spatial problems, we would argue that this achievement 
cannot be explained in terms of IQ alone. This is because in an IQ-matched control 
group fewer than half the children scored highly. Thus, the ability to relate spatial 
images to verbal propositions, and the capacity to deal with such images in a concep- 
tualised form, may be specifically characteristic of mathematically gifted children. 


Further, there is an indication that visual short-term memory for non- 
verbalisable items is to some degree IQ-independent, as such a memory system 
operates more efficiently with both the mathematically and the artistically gifted 
than with their IQ-matched controls who have no such special talents. The extent of 
this memory advantage and the precise characteristics of the visual coding processes 
which are involved, need further investigation. Finally, children gifted for painting 
and drawing can readily identify pictures on the basis of minimal visual cues. Their 
ability in this respect is superior to that of children who operate on a higher cognitive 
level, including those with a special talent for mathematics. One possible 
explanation for this finding could be that it may be the generating of items from 
stored pictorial representation rather than the processing of incoming visual spatial 
information, that plays a part in artistic talent. 
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APPENDIX 1 
VERBALLY PRESENTED SPATIAL PROBLEMS 


. How many diagonals are there on the surface of a cube? 
. A painted wooden cube with an edge of 9 cms is cut up into little cubes each of a 3 cm 


edge. There will be 27 of these little cubes. How many of them will have only two painted 
sides? 


. À pencil is fastened at one end and the other end is free to move in any direction in space. 


What shape will it describe? 


. The mid points of each of the sides of a square are joined together by a line. What figure 


will result? 


. Imagine two figures in which the bases and the heights are the same, i.e., they are equal. 


Must the areas also be equal? 


. The hands of a clock are at right angles to each other and the hour hand points to 12. 


What two times are possible? 


. If I give you a lot of right angle triangles which all have equal sides, опе to the other, how 


many will you need to make up a square? 


. How many squares of the same size will I need to make a bigger square? 
. If I am facing the door and I turn right to the window and then after that I turn right 


round (about face), which way will I have to turn to face the door again? 


. On 3 x 3 chequer board there are 9 squares. A counter starts in one corner and moves a 


square at a time round the edge till it gets back to where it began. How many moves must 
it make to do so? 

If I take a small printed letter ‘‘b’’ and rotate it clockwise till it is upside down, what letter 
will it be now? 

If I have two squares and the length of the side of one is twice that of the length of the side 
of the other smaller square, how many small squares would fit into the big square? 
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SEX STEREOTYPES AND ATTITUDES TO SCIENCE AMONG 
ELEVEN-YEAR-OLD CHILDREN 


By ALISON KELLY 
(Department of Sociology, University of Manchester) 


AND BARBARA SMAIL 
(Didsbury Faculty, Manchester Polytechnic) 


Summary. A range of attitude, achievement and sex stereotyping tests were administered 
to 11-year-old children in their first term at comprehensive school. The boys were found 
to be markedly more sex-stereotyped than the girls. Able girls and those from middle class 
homes were slightly less sex-stereotyped than others. For both sexes a feminine self- 

image was weakly linked to low academic achievement, and a masculine self-image to 
high achievement. Children who endorsed sex stereotypes showed less interest than other 
children in learning about the areas of science traditionally associated with the opposite 
sex. 


INTRODUCTION 


Oxe of the reasons that girls tend to avoid scientific and technical subjects at school 
may be that these subjects have a masculine image. During adolescence, children are 
seeking to establish their own gender identity, and their self-image may be vital in 
determining their attitudes towards school subjects. This suggestion is frequently 
heard (see, for example, Kelly, 1981) but at present it is mainly conjecture. Science 
undoubtedly has a masculine image (see Weinreich-Haste, 1981) but this does not 
necessarily repel girls or attract only a certain sort of girl. That is a matter for 
empirical investigation. 


A few studies already exist which look at the relationship between sex stereo- 
types and cognitive measures. Serbin and Connor (1979) showed that 4-year-old 
children (of both sexes) who played with ‘‘boys’ toys’’ (i.e., were more masculine) 
scored higher on spatial than on verbal tests. The reverse was true for children who 
played with ‘‘girls’ toys’’. Vaught (1965), using adult subjects, also found a relation- 
ship between spatial ability and masculinity, irrespective of biological sex. Dwyer 
(1974) showed that the sex typing of reading and arithmetic was associated with 
performance in these subjects across an age range from 8 to 18. And Nash (1975) 
found that although knowledge of sex stereotypes was not in general associated with 
performance on a spatial test for 11- and 14-year-old children, girls who saw them- 
selves as more masculine than other girls also did better on spatial ability tests. 


All the studies mentioned so far are American, and most of them have been 
concerned with ability measures rather than school-based achievements or attitudes. 
In Britain there has been much less work on sex stereotypes, but rather more on 
subject specialism. Smithers and Collings (1981) showed that girls studying science 
in the sixth form see themselves as significantly more masculine than arts specialists. 
However, they used only a single bipolar item to measure masculinity, and their 
research design did not allow them to say which came first, the masculine self-image 
or the orientation towards science. Bradley’s (1981) study of third year pupils 
suggested that personality factors could be used to predict choice, but did not 
investigate sex roles. Sherman and Fennema (1977) found that, when other factors 
were controlled, American girls’ perception of mathematics as a male domain was 
not strongly related to their achievements in maths, or their decision to continue 

158 


ALISON KELLY AND BARBARA SMAIL 159 


studying the subject. However, they did not consider the girls’ self-images. The 
present study examines the relationship between sex stereotypes and attitudes to 
science directly. 


METHOD 
The data used in this paper come from the initial survey of the Girls Into 
Science and Technology (GIST) project. GIST is an action research project which is 
working with teachers to devise and implement strategies designed to encourage 
more girls to study scientific and technical subjects when these become optional (see 
Smail et al., 1982, for a fuller description of the project). Ten co-educational 
comprehensive schools in the greater Manchester area are involved. All 2,065 
11-year-old children entering the first year of these schools were given a number of 
attitude and achievement tests during their first term at secondary school. The tests 
were administered by teachers during normal lesson time, and the results fed back to 

the schools as part of the intervention process. 


The test battery included three cognitive tests (science knowledge, SKTOT; 
spatial visualisation, SVTOT; and mechanical reasoning, MRTOT). There were also 
three attitude questionnaires (Scientific Curiosity, Scientific Activities and Image of 
Science) from which were derived 11 attitude scales (see Table 1). Measures of 
socio-economic status (CLASS, based on parental occupation) and general ability 
(IQALL) were also available. The construction and characteristics of these tests and 
scales are described elsewhere (Smail and Kelly, 1984a; 1984b). In addition two sex- 
stereotyping inventories were devised and administered (described below). 


TABLE 1 
SUBSCALES OF THE ATTITUDE TO SCIENCE QUESTIONNAIRES 





Scientific Curiosity 

PHYSCUR Desire to learn about physical science 
NATSTCUR Desire to learn about nature study 
HUMBICUR Desire to learn about human biology 
TVSCICUR Desire to learn about spectacular science 


Scientific Activities 

TINKERAC Previous participation in tinkering activities 
BIOLSCAC Previous participation in biological science activities 
THEOSCAC Previous participation in theoretical science activities 


Image of Science 
SCIWORLD The social consequences of science 


LIKESCI Personal interest in science 
SCIENT Image of scientists 
SCIMALE Science as a masculine subject 


There is now a considerable literature on sex stereotypes, and a number of 
measures have been developed (e.g., Bem, 1974; Best et al., 1977; Connor and 
Serbin, 1977; Spence et al., 1975). However, most studies have been concerned 
either with very young children or with adults, and much less is known about sex 
stereotypes among adolescents. Studies with young children tend to use behavioural, 
play-based measures, and are of course individually administered. By contrast, 
studies with adults usually employ group tests which evaluate personality attributes. 
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For our purposes a group test was essential because of the large sample size. But a 
pilot study suggested that 11 -year-olds did not in general approve of sex stereotyping 
in personality traits. Thus none of the existing measures seemed suitable for this 
intermediate group, and a new schedule was devised. 


There is a clear conceptual distinction to be made between tests which measure 
knowledge of sex stereotypes and those which measure endorsement of these stereo- 
types (although this is often confused in the literature when the same term is used for 
both). In this study we wanted to differentiate children in terms of the extent of sex 
stereotyping shown in their opinions (not in the extent to which they could recognise 
the commonly accepted stereotypes). So we did not ask them to decide whether an 
activity or occupation was more suitable for, or typical of, females or males. Instead 
they had to rate its suitability separately for each sex and then for themselves. This 
technique has the added advantage (following Bem, 1974) of allowing children to 
express both feminine and masculine orientations, without assuming that these are 
mutually exclusive. 


Two different forms of stereotyping inventory were used, Gender and Occu- 
pational. The children were asked to rate a range of behaviours or occupations on a 
five-point scale from 5 (very good thing or very suitable) through 3 (all right) to 1 
(very bad thing or very unsuitable), first for girls and then for boys. They also rated 
their own manifestation of the behaviour (from 5, ‘‘very often’’, to 1, ‘‘never’’) or 
opinion about the job (from 5, “Га like it a lot" to 1, “Td hate it"). Girls’ and 
boys’ average rating on each item are shown in Tables 2 and 3. 


The Gender Stereotype Inventory was originally intended to tap personality 
traits (e.g., girls are passive, emotional, caring; boys are dominant, logical, tough). 
In the event such characteristics were not, in general, considered more suitable for 
one sex than the other. This is an interesting finding in itself, and one which deserves 
more thorough investigation (especially since Best et al. (1977) have shown that 
young children can recognise sex stereotype in personality attributes). However, 
items with low discrimination are of little use in building a scale to measure stereo- 
types, and most of these items were dropped at the pilot stage. Those that were 
retained (e.g., worry over a row with a friend; tell people what to do) show only 
small sex differences (see Table 2). Eleven-year-old children seem to accept sex 
stereotypes in behaviour (such as looking after a baby or being interested in guns) 
rather than in personality. This may be because behaviours are more concrete and 
more easily conceptualised by children than abstract personality traits. However, 
even in the realm of behaviour it was difficult to find strongly sex-typed examples. 
By contrast it was very easy to find strongly sex-typed jobs, and those included in the 
final test were chosen to span a range of stereotypes. The difficulty in compiling the 
Occupational Inventory was to find enough stereotypically female jobs to balance 
the male jobs. This, of course, reflects the narrow range of occupations at which 
women in Britain work, and the even narrower range of occupations at which they 
are perceived to work. 


On both the Gender and Occupational Stereotype Inventories the items were 
divided into feminine and masculine groups. However, this was done by slightly 
different criteria. On the Occupational Inventory jobs which were rated significantly 
more suitable for one sex than for the other were assigned to the appropriate group. 
Three jobs (social worker, teacher and reading the news on TV) were not included in 
either group. On the Gender Inventory behaviours were counted as feminine if girls 
said they displayed this behaviour more often than boys said they did. All 
behaviours were assigned to one group or the other, although sometimes the 
differences were small. 
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TABLE 2 


MEAN RATINGS GIRLS AND BOYS GAVE ON THE GENDER STEREOTYPE INVENTORY FOR THE SUITABILITY OF 
EACH BEHAVIOUR FOR GIRLS AND BOYS AND THE FREQUENCY WITH WHICH THEY THEMSELVES DISPLAY 
THIS BEHAVIOUR 





Suitability Suitability 
for girls for boys Self done 
G B G B G B 


N 
o 


. Look after a baby 

. Have adventures 

. Have long hair 

. Tell people what to do 

Get clothes dirty 

. Help with housework 

. Hug and kiss people 

. Help fix the car 

. Climb trees 

10. Wear trousers 

11. Go to a disco 

12. Read love stories 

13. Hit someone who is rude to her/him 
14. Play with Girls’ World 

15. Lift heavy things 

. Cry at sad stories 

17. Make the beds 

18. Mend a bicycle 

19. Have one best friend 

20. Be interested in guns 

21. Make something from wood 
22. Play with skipping ropes 
23. Build a go-cart 

24. Play football 

25. Learn to sew 

26. Worry over row with friend 
27. Play rough games 

28. Wear make up 

29. Play with Scalectrix 

30. Play rounders 


^ 
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Е. Stereotypically feminine behaviour 
M. Stereotypically masculine behaviour 
5-0 — very good thing; 3-0 — all right; 1-0 — very bad thing 


Three scales were derived from each of the Inventories (see Table 4). SEXTYPE 
was the average difference between the suitability of a behaviour or job for girls and 
its suitability for boys. Children who scored low on SEXTYPE tended to give equal 
suitability ratings for girls and boys, whereas children who scored high saw large 
differences in what was suitable for each sex. FSELF and MSELF give the average 
rating for self on the items included in the feminine and masculine groups respec- 
tively. They measure self-image — the extent to which children see themselves as 
feminine or masculine. Any individual child can score high or low on both FSELF 
and MSELF — they are in no sense opposites. 

Most children completed either the Occupational or the Gender Stereotype 
Inventory but about 200 pupils at one school did both. The intercorrelations 
between the corresponding scales on the two questionnaires for these children are 
shown in Table 5. These suggest that despite the differences in their construction, 
the two inventories are measuring similar attributes for girls. Taken together with 
the high reliabilities in Table 4 these figures suggest sex-typing, masculinity and 
femininity are identifiable and relatively stable traits of 11-year-old girls. The inter- 
correlations between the two inventories are lower for boys, although the scale 
reliabilities are still high. Perhaps boys use different criteria of masculinity and 
femininity for judging occupations and behaviours. 
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TABLE 3 


MEAN RATINGS GIRLS AND BOYS GAVE ON THE OCCUPATIONAL STEREOTYPE INVENTORY FOR THE 


SUITABILITY OF EACH OCCUPATION FOR GIRLS, BOYS AND THEMSELVES 
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. Librarian 
. Police officer 


Electrician 

Social worker 
Model 

Computer operator 
Taxidriver 


. Looking after children 


Car park attendant 


Suitability 
for girls 
G B 
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. Teacher 
. Cleaner 
. Making transistor radios in a factory 
. Car mechanic 
. Secretary 

. Reading the news on TV 


. Making cars in a factory 
. Mending typewriters 

. Ballet dancer 
. Manager of a large firm 
. Air pilot 
. Scientist 
. Making clothes зп a factory 
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28. Shop assistant 
29. Miner 
30. Doctor 
F. Stereotypically feminine occupation 
M. Stereotypically masculine occupation 
— No clear gender stereotype 
5:0 — very suitable; 3-0 — all right; 1-0 — totally unsuitable 
TABLE 4 
Sex-STEREOTYPING MEASURES 
Variable No of Description Reliability N 
Name items Girls Boys Girls Boys 
SEXTYPE O 30 Average difference in rating of 0-89 0-89 358 341 
suitability of jobs for girls and 
for boys. 
G 30 Average difference in rating of 0-89 0-90 440 434 
suitability of activities for girls 
and for boys. 
FSELF о 10 Own liking for feminine jobs. 0:78 0.77 415 405 
G 15 Own participation in feminine 0-67 0-67 473 471 
activities. 
MSELF О 17 Own liking for masculine jobs. 0-85 0-82 415 405 
С 15 Own participation in masculine 0.85 0-82 473 471 


activities. 





ALISON KELLY AND BARBARA SMAIL 163 


TABLE 5 
INTERCORRELATIONS BETWEEN OCCUPATIONAL AND GENDER SCALES 


Girls Boys 

SEXTYPE 0-51 0-31 

FSELF 0-47 0-40 

MSELF 0-56 0-18 

N 91-100 72-86 
RESULTS 


On both the Occupational and Gender Stereotype Inventories boys appeared to 
be considerably more sex-typed than girls (see Table 6). Boys made a greater distinc- 
tion than girls between behaviours and jobs that were suitable for girls and those 
that were suitable for boys. This was also evident in the children’s self-ratings 
(FSELF and MSELF) which were much more polarised for boys than for girls. The 
boys saw themselves as high on masculinity and low on femininity, whereas the girls’ 
self-ratings were more moderate on both scales. The same trends were evident on 
both the occupational and gender scales, and they were repeated on the SCIMALE 
scale (Smail and Kelly, 1984b). Boys were much more likely than girls to see science 
as more suitable for males than for females. The same trend was found for mathe- 
matics by Fennema and Sherman (1977) in the US and by Sturgeon (1981) in 
Britain: boys were far more likely than girls to stereotype maths as a male domain. 


TABLE 6 
AVERAGE SCORES FOR GIRLS AND BOYS ON THE SEX STEREOTYPING SCALES 


Occupational Gender 
Girls Boys Girls Boys 
SEXTYPE 1:7 1-9 1-1 1:4 
FSELF 3-4 1-8 3:1 2:2 
MSELF 2:4 3-6 2:8 3-8 
N 420-430 419-444 494-531 500-556 


All sex differences significant beyond the 0-1 per cent level. 


The intercorrelations between sex stereotyping and self image as masculine or 
feminine are shown in Table 7. For both sexes, but particularly for boys, there is a 
positive relationship between masculinity and femininity. Spence ef al. (1975) found 
a similar pattern among American college students. Far from being opposites, 
children who enjoy masculine activities also seem to enjoy feminine activities. This 
may indicate a general enthusiasm for life, a tendency to do things or think that jobs 
would be interesting. On the other hand the distinctions between FSELF and 
MSELF are evident from their correlations with SEXTYPE. Girls who see them- 
selves as masculine tend to be less sex-typed than other girls, whereas those who see 
themselves as feminine tend to be slightly more sex-typed than others. There is an 
equivalent pattern for boys — feminine boys are less sex-typed and masculine boys 
are more sex-typed than others. In other words, accepting the traditional sex- 
appropriate role for oneself is linked to sex stereotyping of appropriate activities for 
others. 
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TABLE 7 
INTERCORRELATIONS BETWEEN SEX-STEREOTYPING AND SELF-IMAGE AS MASCULINE AND FEMININE 


t 


GIRLS BOYS 
: SEXTYPE FSELF MSELF SEXTYPE FSELF MSELF 
SEXTYPE x — —0-40 x —0-40 0:17 
FSELF 0:13 x 0-17 Occupational —0:20 x 0:28 Occupational 
MSELF —0-15 — x 0-33 0:31 x 
Gender Gender 


AII correlations shown are significant at or beyond the 5 per cent level. 
Correlations less than 0-10 not shown. 


Surprisingly little is known about the relationships between sex stereotypes and 
basic background factors such as socio-economic status and general ability — 
perhaps because so much of the work has been done on American college students. 
Table 8 explores these relationships for 11-year-old school children. The correlations 
are generally small. Among girls, more able children and those from middle class 
backgrounds are slightly less sex-typed than less able and working class children. 
However, this does not hold for boys; more able boys are if anything slightly more 
sex stereotyped than less able boys, and class is unrelated to sex stereotypes for boys. 
More able girls are much less likely to see science as a masculine subject, but again 
the effect is smaller, although still significant, for boys. For both sexes a feminine 
self image is linked to lower ability — which perhaps reflects the popular stereotypes 
of females as less able than males in our society. But whereas masculinity is related 
to higher social class and ability for girls, the reverse is true for boys. This suggests 
that above average masculinity in boys may reflect an anti-academic, macho culture, 
whereas in girls it indicates the breaking of cultural restrictions on feminine achieve- 
ment (see Maccoby, 1966, for a fuller development of this idea). CLASS and 
IQALL are themselves related, although only weakly (г = 0:29 for girls and 0-17 
for boys). 

TABLE 8 


CORRELATIONS BETWEEN SEX STEREOTYPES AND GENERAL ABILITY AND 
SOCIAL CLASS FOR ELEVEN-YEAR- OLD GIRLS AND Boys 


SEXTYPE FSELF MSELF SCIMALE 
о G о G о G 
GIRLS 
IQALL — -021  -0:12-0:18 — 0.11 —0:40 
CLASS -0:15-0:19 -0:13 — — 0:21 —0.15 
ВОҮ$ 
IQALL — 0-16 -0:15-0:25 -011 — —0:14 
CLASS — — — m -0:12 — = 


All correlations shown are significant at or beyond the 5 per cent level. 
Correlations less than 0:10 not shown. 


The relationship between sex stereotyping and attitudes to and achievements in 
science and technology are shown in Table 9. The effects of the background 
variables, CLASS and IQALL, have been partialled out in this table. This makes 
little difference to the relationships between sex stereotypes and attitudes to science, 
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but reduces the correlations between sex stereotypes and the achievement variables 
considerably. The partial correlations are generally small, but nevertheless there are 
some interesting patterns. Sex stereotyped girls, and those who see science as a 
masculine subject, do worse on all three cognitive measures than other girls. This is a 
specific impairment in science and technology related to sex stereotypes, since the 
effect of general ability has already been allowed for. Moreover, it is evident before 
the children have really experienced science or technology at secondary school, For 
both sexes there is a slight tendency for a feminine self-image to be linked to poor 
achievement in the cognitive tests, and a masculine self-image to be linked to high 
achievement. Again this is independent of the effects of class and general ability. 


Turning to attitudes to science, we see that approval of sex stereotypes is nega- 
tively related to being keen to learn about physical science for girls and to being keen 
to learn about nature study for boys. In other words, children who think in terms of 
sex roles when judging others have already limited their interest in the branches of 
science most closely associated with the opposite sex, before they have encountered 
science at secondary school. Both masculine and feminine self-images are positively 
related to curiosity about science, which perhaps indicates a general tendency to 
reply positively to questionnaire items. But it is noticeable that a masculine self- 
image is most strongly related to interest in physical science and a feminine self- 
image to interest in nature study for both sexes. This again suggests that children’s 
self-images can affect their desire to learn. 


The same general trend is evident with the Scientific Activities scales. Both a 
masculine and a feminine self-image are linked to greater involvement in these 
activities, but masculinity has the largest correlation with tinkering, and femininity 
with biological science. The Gender Stereotype Inventory in fact includes some 
tinkering activities in its masculinity scale, so part of this effect may be tautologous. 
However, this is not true for the Occupational Inventory or for biological activities. 


On the Image of Science questionnaire, the most interesting effects are with 
SCIMALE. For both sexes, seeing science as more suitable for boys is positively 
related to endorsing sex stereotypes in children’s behaviour. This provides some 
additional evidence of validity for the scales. Moreover girls who see themselves as 
feminine are more likely than other girls to accept the idea that science is masculine, 
whereas girls who see themselves as masculine reject this idea. Seeing science as 
masculine is negatively related to the other Image of Science subscales, all of which 
measure positive attitudes to science. This suggests that a sex-neutral attitude to 
science is part of a general constellation of ideas about science as an open, beneficial 
and pleasurable activity. 


DISCUSSION 


A number of interesting results have emerged from this study. Many of the indi- 
vidual correlations are small, often so small as to seem unimportant even when 
statistically highly significant. (Since the pupils in the GIST survey were not a 
random sample, significance tests are, strictly speaking, inappropriate. However, 
the pupils were not unrepresentative of those in other urban schools and the tests 
have been included to give a rough indication of the probability of getting these 
results by chance). But when a similar pattern is repeated on several independent 
variables — as with the different measures of sex stereotyping or the different 
cognitive tests — it suggests that there is a consistent relationship, even if it is weak 
or poorly measured. 


Previous research has shown that, at 11 years of age, girls are more 
knowledgeable than boys about sex stereotypes (Nash, 1975). This may be related to 
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their greater cognitive and emotional development at this age. Although girls may 
possess this knowledge, the results presented here suggest that they do not use it to 
the same extent as boys in their evaluation of suitable behaviour for females and 
males. On all the sex stereotyping measures, boys scored considerably higher than 
girls. Spence et al., (1975) and Dwyer (1974) also found males exhibiting stronger sex 
stereotypes than females. Dwyer suggests that the pressures to conform to their sex 
role may be stronger for boys than for girls (it is worse to be a cissy than a tomboy) 
but that these pressures are created by the boys themselves in their rigid adherence to 
sex stereotypes. This interpretation is supported by the present results. 


Femininity and masculinity did not appear as opposites in this study. On the 
contrary they were slightly positively related. However their effects were in some 
ways quite distinct. For both sexes femininity was weakly linked to low achievement 
in science related tests, and masculinity to high achievement. This was true even 
after allowance had been made for the effect of general ability (which was itself 
negatively related to femininity, although its relationship with masculinity was more 
ambivalent). The cognitive tests were taken before most of the children had been 
taught much science at school, and their scores reflect the incidental knowledge they 
have picked up from their surroundings. As such they can be considered almost as a 
measure of interest in science and they suggest that femininity is associated with low 
interest in science. This impression is reinforced by other results. Girls who saw 
themselves as feminine had a tendency to see science as masculine, and so, 
presumably, not for them. And children who accepted sex stereotypes showed 
slightly less interest than other children in the branches of science traditionally asso- 
ciated with the opposite sex. 


These results are worrying because they suggest that some children are limiting 
their academic horizons on the basis of their sex stereotypes before they have 
experienced subjects at school. Sex stereotypes and the perception of science as a 
male domain, are consistently, if weakly, related to willingness to learn science 
among 11-year-old children. It will be interesting to see if these relationships persist 
to the crucial option choices at the end of third year. The children involved in the 
GIST project are being followed through secondary school until they make their 
choices between subjects. The battery of attitude, achievement and sex stereotyping 
tests will be repeated at the option choice stage. By comparing first and third year 
results we hope to be able to see how children's attitudes have developed, and 
whether sex stereotypes which are evident at 11 years of age have indeed inhibited 
the later learning of science and technology. 
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THE EFFECTS OF ABILITY, STRATEGY, AND EFFORT 
ATTRIBUTIONS FOR EDUCATIONAL, BUSINESS, AND 
ATHLETIC FAILURE 


By MARGARET M. CLIFFORD 
(University of Iowa) 


Summary. The effects of attributing educational, business, and athletic failure to low 
effort, low ability or use of inappropriate strategy were examined with the use of high 
school teachers as subjects. Strategy attributions were generally found to produce the 
most constructive effects. Perceived affect, support for future endeavour, and a measure 
of negative generalisability were rated significantly more favourably under the strategy 
attribution condition than under the effort attribution condition. The field in which 
failure was experienced interacted with sex; males appear to judge failure in business 
more negatively than failure in sports as measured by perceived affect and negative 
generalisability, while females show reverse trends. The data generally offer strong 
support for the contention that strategy attributions can produce more positive effects 
than effort attributions and that strategy attribution-training may have greater 
educational pay-offs than effort-attribution training. These data also raise questions 
regarding the assumption that behaviour is a linear function of the stability dimension. 


INTRODUCTION 


Уни theorists agree that the attributional dimensions (e.g., stability, internality, 
controllability) represent continua (Abramson ef al., 1978; Weiner et al., 1971), 
most researchers continue to exemplify and experimental examine these 
dimensions as dichotomies. For example, effort attributions, used to exemplify low 
stability, are contrasted with ability attributions, used to exemplify high stability. 
This operational dichotomising of theoretical continua has led to conclusions 
implying linear functions (Abramson ef al., 1978; Meyer, 1980). For instance, since 
effort attributions for failure have repeatedly been found to produce more construc- 
tive effects than ability attributions (e.g., McMahan, 1973; Fontaine, 1974; Dweck, 
1975; Andrews and Debus 1978), it is generally held that as the stability of a failure 
attribution increases, the probability and desirability of behavioural change 
decreases (Dweck and Reppucci, 1973; Weiner, 1974, 1979; Weiner and Sierad, 
1975; Meyer, 1980). 


A second and related conclusion that has emerged as a result of this pattern of 
reasoning and research is that training students to change ability attributions for 
academic failure to effort attributions is probably one of the best motivational 
means of improving attitude and performance in educational settings. Both the 
linear function assumption and the effort-attribution training assumption have 
recently been challenged (Clifford, 1986) and both warrant continued examination. 


In initially challenging these conclusions, Clifford emphasised the need for 
identifying and operationally defining an attribution that might exemplify an inter- 
mediate value of stability — an attribution more stable than effort and less stable 
than ability. She suggested that strategy, defined as ‘‘methods and techniques used 
to develop skills", might be used to exemplify such an intermediate value on this 
continuum and she offered several reasons for predicting that strategy attributions 
could be more beneficial than either ability or effort attributions. (For example, 
strategy attributions allow one to reduce or escape guilt typically associated with low 
effort and the embarrassment or shame typically associated with low ability.) 
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Among the earliest observations linking strategy and the attributional process 
were those made by Diener and Dweck (1978) who investigated strategy as a 
correlate of ability and effort attributions. On the basis of a pair of studies 
conducted with fifth-graders, these researchers reported that ‘‘mastery-oriented’’ 
children, (i.e., children who tend to attribute failure to lack of effort) in contrast to 
**helpless" children (i.e., children who tend to attribute failure to lack of ability), 
engaged in more self-monitoring and self-instruction as well as more effective 
changes in strategy when encountering failure on a three-dimension discrimination 
problem. More than 33 per cent of the mastery-oriented subjects actually 
demonstrated improved strategy when encountering failure; about 15 per cent 
showed a deterioration in strategies. In helpless subjects, the percentages for these 
two categories were 0 per cent and 70 per cent respectively. There were corres- 
ponding differences in subjects’ verbalisations also. Helpless, in contrast to mastery- 
oriented children made significantly more low-ability attributional statements (e.g., 
“I never did have a good memory’’), negative affect statements (e.g., ‘“This isn't 
fun anymore’’), and solution-irrelevant statements (e.g., ‘“There is a talent show this 
weekend, and I am going to be Shirley Temple’’). Mastery-oriented children made 
statements such as, “Тһе harder it gets, the harder I need to try", “I love a 
challenge’’, and “Tve almost got it now’’, reflecting significantly more self-instruc- 
tion, self-monitoring, positive prognosis, and positive affect. Based on their 
findings, Diener and Dweck (1978) concluded, ‘‘Helpless children ruminate about 
the cause of their failure and, given their attributions to uncontrollable factors, 
spend little time searching for ways to overcome failure. Mastery-oriented children, 
on the other hand, seem to be directed towards the attainment of a solution. They 
are less concerned with explaining past errors and more concerned with producing 
future success’’ (p. 460). While this work suggests that the use of effort and strategy 
attributions are highly correlated, it precludes the study of any distinctive effects 
these two attributions might have upon behaviour. 


Among the earliest experimental manipulations of strategy attributions are 
those found in studies conducted by Anderson and Jennings (1980) and Anderson 
(1983). These studies generally demonstrated that subjects led to attribute failure on 
a blood-donors persuasion task to strategy subsequently showed greater persistence 
and more constructive behaviour than subjects led to attribute their failure to 
ability. Consistent with Diener and Dweck’s finding that effort and strategy 
attributions are correlated, Anderson (1983) combined and thereby also confounded 
effort and strategy attributions in a condition which he labelled ‘‘strategy-effort 
attribution’’. He demonstrated that this condition led to greater task persistence and 
more appropriate change in behaviour than did an ability attribution condition. But 
neither of these experimental studies allowed for a comparison of strategy and effort 
attributions. 


Two recent studies that do provided such comparisons are those of McNabb 
(1983) and Clifford (1986). McNabb had seventh-grade teachers read a detailed 
description of a seventh-grade student’s failure experience in a social studies class. 
The student was described as attributing his failure to either ability, effort or 
strategy. The teachers then expressed (1) how lenient the student’s teacher would be 
in judging his work, (2) how negative the student’s feelings would be, and (3) what 
level of future performance they predicted for the student. McNabb found that 
ability attributions elicited significantly more lenient judgments than strategy attri- 
butions which elicited significantly more lenient judgments than effort attributions. 
This suggests that behaviour is a linear function of the stability dimension, assuming 
strategy is more stable than effort and less stable than ability. 


McNabb also found that the ratings for student feelings were significantly less 
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positive when failure was attributed to ability in contrast to effort; strategy attri- 
butions produced a student affect rating of an intermediate value and not signifi- 
cantly different from that obtained in either of the other two conditions. While this 
second finding offers further support for the linear function between behaviour and 
stability of attribution, it should be noted that contrary to McNabb’s first finding, 
this second finding suggests that strategy attributions may tend to produce less 
desirable effects than effort. (McNabb found no attributional main effect for the 
future performance measure. Neither did she find main effects or interactions 
involving the sex factor.) 


In a related study, Clifford (1986) compared effort and strategy attributions 
using high school teachers and college students as subjects. These subjects were 
provided a detailed description of a female college freshman who had obtained a 
C — average (GPA of 1-8) on her first semester's work and who, in a personal dis- 
cussion with her advisor, attributed her low performance to either effort or strategy. 
Subjects judged both the student's future performance and her attitude toward 
School to be significantly less positive when low academic performance was 
attributed to effort in contrast to strategy. This pattern was found for both high 
school teachers and college subjects. In contrast to McNabb's data, these findings 
challenge rather than support the linear function assumption. Assuming that ability 
attributions would have produced the lowest future performance and attitude 
ratings, and assuming that strategy represents an intermediate level of stability with 
ability and effort representing lesser and greater stability, respectively, Clifford 
contended ‘‘a curvilinear in contrast to a linear function would appear to better 
describe the relationship between stability and future performance as well as the 
relationship between stability and attitude” (p. 81). 


In short, the research to date on strategy attributions suggest, (1) while 
positively correlated, effort and strategy attributions have distinguishable effects on 
behaviour, (2) behaviour can at times be more enhanced by strategy than by effort 
attributions, and (3) behaviour may be either a linear or non-linear function of 
stability depending upon the nature and conditions of behaviour. 


These tentative conclusions, coupled with recent learning and metacognition 
research that indicates teaching students learning strategies significantly enhances 
academic performance (Brown and Campione, 1978; Brown, 1981; Cook and 
Mayer, 1983; Garner ef al., 1983; Kyllonen et a/., 1984), provide strong incentives 
for further examining strategy attributions as compared with ability and effort 
attributions. The potential value of strategy in contrast to effort attributions is 
further emphasised by a recent study that contrasted learning-strategy training, 
effort-attribution training, and the combination of these training conditions: using 
fourth-grade boys, Short and Ryan (1984) demonstrated that low-skilled readers 
given learning-strategy training (e.g., encouraged to vocalise story questions and 
underline information answering questions) made significant gains in compre- 
hension and performed as well as skilled readers on measures of free and probed 
story recall. A group of low-skilled readers given effort-attribution training showed 
virtually no improvement and a group given the combination of effort-attribution 
training and learning-strategy training performed no better or worse than did the 
learning-strategy-only subjects. These authors concluded, ‘‘These findings suggest 
that [learning] strategy training was clearly superior to [effort] attributional training 
in its separate effects and appears to be only minimally augmented under 
unconstrained circumstances by motivational components" (p. 233). What is left 
unaddressed by this study is whether strategy-attribution training is a more potent 
motivational technique than effort-attribution training. 


In addition to further comparing the general effects of ability, strategy, and 
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efforts attributions, there is a need to examine the absolute and relative effects of 
these attributions across diverse fields or situations. Frieze and Snyder (1980) found 
that the nature of the situation significantly affected the type of attributions with 
which subjects explained outcomes. They presented four achievement situations 
(i.e., academic examination, art project, football game, and catching a frog in a 
pond) to first- third- and fifth-grade children and examined their attributional 
responses for success and failure outcomes in each situation. While very few sex 
differences emerged, there were major differences among the four situations. The 
percentage of subjects giving effort and ability attributions, respectively, for each.of 
these four situations were as follows: academic exam, 65 and 15 per cent; art 
project, 27 and 34 per cent; football game 35 and 23 per cent; frog catching, 24 and 
12 per cent. The authors concluded, ‘‘Results of this study clearly indicate that the 
same group of children may use very different causal explanations for success or 
failure in various domains. . . . Effort is seen as generally important but never so 
clearly as in testing situations. Ability is seen as more important for finishing an art 
project successfully and for winning or losing in football" (Frieze and Snyder, 1980, 
p. 194). (It is interesting to note that the 14 attributional categories used to classify 
the subjects! free responses did not include strategy, method, or technique.) 


If, in fact, a given attribution is judged to be more typically associated with 
failure in one situation than in another, it may well be that the expectation and 
evaluative effects of different attributions will also vary with situation. Thus, the 
present study was designed to further examine the effects of strategy attributions for 
failure in comparison with ability and effort attributions across three fields of 
situations; namely, education, business and athletics. It is predicted that strategy 
attributions will generally produce the most constructive effects. A 3 x 3 x 2 
(Field x Attribution x Sex) between-subject design with four dependent measures 
(i.e., future performance expectation, perceived affect, rater's support, and negative 
generalisability) were used. 


METHOD 


Subjects 

The original subject pool consisted of 351 senior high school teachers in the 
state of lowa. These subjects were randomly selected from the Department of Public 
Instruction’s listing so that they reflected the proportion of males and females 
teaching at this level; 66:6 per cent of the original sample was male. Subjects 
completing and returning the research materials numbered 299 for a 85-2 per cent 
overall return rate; 66-8 per cent of this response sample was male. 


Materials and procedures 

Nine situational descriptions of a male who experienced a failure outcome and 
one set of response items were developed for this study. Each description reflected 
failure in one of three fields (i.e., education, business, or athletics) for one of three 
reasons (i.e., lack of ability, use of inappropriate strategy, or lack of effort). The 
opening paragraph of the situational description for the educational, business, and 
athletic fields, respectively, were as follows: 


Ron Caperton is a business student at a major university. He decided to 
expand his knowledge of business by getting a master's degree in business 
management. His goal was to pass the master's degree exam given at the end of 
pone year of graduate study. In fact, however, he failed the exam quite 

adly. 


Ron Caperton bought his father's office supply store in a university town. 
He decided to expand his business by adding office furniture and equipment. 
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His goal was to make a profit on the furniture and equipment by the end of the 
second year. In fact, he had a rather substantial debt at the end of his second 
year. 

Ron Caperton was a college tennis player at a major university. He decided 
to turn pro after completing college. His goal was to win at least three regional 
tournaments by the end of his second year as a pro. In fact, however, he did not 
win a single tournament out of the 16 that he played during those two years. 


The attribution manipulation was achieved through a quote, supposedly 
representing Ron’s explanation for failure which he offered to a friend engaged in a 
similar endeavour. The quote used to represent failure attributed to use of 
inappropriate strategy in the educational field is given as an example of the 
attribution manipulation and read as follows: 


I know now why I failed. I simply used the wrong study methods. It takes 
special learning strategies to successfully get a master’s degree. . . I spent too 
much time and energy making notes from taped lectures and too little time and 
energy on reviewing and discussing course and lecture material. I also put too 
many of my class notes in lists rather than in good, logical outline form. In a 
nutshell, my failure tells me I used the wrong strategies . . . 


Some people might have failed at this task because they didn’t try or put 
forth enough effort. Others might have failed because they didn’t have the 
background, ability and talent. But in my case, there is no question that I made 
an honest, sincere effort and I also have the ability needed for the task. I just 
have to admit that I used the wrong learning and study strategies and methods. 


Following the attribution quote were 10 statements each accompanied by a 
6-point ‘‘agree-disagree’’ response continuum. The first three statements were used 
as a dependent measure of future performance; these statements read: 


Ron will probably try again to reach his general goal. 


If Ron does try again, there will be a noticeable change in his work-related 
behaviours. 


If Ron does try again, he will probably succeed in reaching his goal. 
The fourth statement served as a measure of negative generalisability and read: 


If Ron does not try again, he will probably not set any personal goals for a 
while. 


The next four statements were used as a measure of perceived affect; they read: 
Ron feels sad about his failure. 
Ron feels ashamed about his failure. 
Ron feels guilty about his failure. 
Ron probably feels angry about his failure. 
The last two statements served as a measure of support; these statements read: 


If you had offered Ron financial support to help him achieve his original 
goal, you would probably give him some additional financial support if he 
decided to try again. 


If you had succeeded in reaching a goal similar to Ron’s, you would 
probably be willing to give him some free help if he decided to try again. 


Responses to these sets of statements were scored and averaged so that high 
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mean values (range = 1 to 6) for each of the four dependent variables represented 
relatively constructive responses to failure (i.e., good future performance, weak 
negative generalisability, weak feelings of shame, guilt, etc. and strong rater 
support). 


The situational descriptions and response forms were mailed to subjects along 
with a cover letter soliciting subjects’ cooperation, and self-addressed stamped 
envelopes in which materials could be returned. A follow-up letter and second set of 
materials were mailed to 139 non-respondents two weeks after the initial mailing. 


RESULTS 

Pooled within-groups correlations for the individual item responses revealed 
that the mean correlation for items comprising future performance, affect, and 
support measures were 0-55, 0-44 and 0-33 respectively. While the composite 
measure of support and future performance correlated 0:35, affect and future 
performance and affect and support correlated — 0-07 and — 0-08 respectively. The 
correlations between the generalisability item and all other items were all modest, 
yielding a mean correlation of 0-16. 


Each of the four dependent variables was analysed using an ANOVA program 
which allowed for unequal Ns. All simple effect tests reported below are based on 
the Tukey-Kramer modification (Kirk, 1982) and the 0-01 level of significance. A 
minimum value of q = 4:12 (3,281) can be assumed for all simple effects reported 
р pe UD NU The cell means for the four dependent variables are presented in 

able 1. 


TABLE 1 
MEANS FOR DEPENDENT VARIABLES 
i Variables 
f Future General- 
Conditions (N) Performance Affect Support isability 
Males 
Education 
Abihty (21) 2:75 3-94 3-95 2-90 
Strategy (22) 4-79 4-09 4:11 3-82 
Effort (21) 4:41 3-05 3:88 3-00 
Business 
Ability (21) 2.92 2:98 3:45 2:43 
Strategy (25) 4-75 3-93 4-26 3:64 
Effort (23) 4-09 3-55 3:24 3-09 
Athletics 
Ability (23) 2:35 4-23 3-13 4-30 
Strategy (22) 4:21 4-20 4-09 4-09 
Effort (22) 3-89 3-83 3-48 3:91 
Females 
Education 
Ability (9) 3-33 2-86 3:56 2-44 
Strategy (9) 4:63 3-61 4:28 3-44 
Effort (12) 4:64 3-65 3:21 3-25 
Business 
Ability (10) 3-33 3-95 3:75 3:90 
Strategy (11) 4-52 4-52 3:95 4:27 
Effort (11) 4-91 3-61 3:86 3:27 
Athletics 
Ability (14) 2:95 3:39 3:54 3-93 
Strategy (11) 4:06 3-16 4:00 3-73 
Effort (12) 3-67 3-65 3-08 2°58 
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For future performance there was an Attribution main effect, F = 66-74 
(2,281), P < 0-001 and a Field main effect, Е = 8-11 (2,281), P < 0:001. Follow- 
up tests revealed that future performance was judged significantly less favourable 
under ability attributions (M = 2:83) than under either strategy attributions 
(M = 4-53) or effort attributions (M = 4-22). The effort and strategy conditions 
did not differ significantly from each other. Also, future performance following 
failure was judged significantly less favourable in the athletic field (M = 3-48) than 
in either the education (M = 4-07) or business field (M = 4-07), which did not 
differ from each other. No other significant main effects or interactions were found 
for future performance. 


For the support measure, only the Attribution main effect was significant, 
F = 12-02 (2,281), P < 0-001. Follow-up tests showed that significantly greater 
support was offered when failure was attributed to strategy (M — 4-13) than when 
it was attributed to either ability (M = 3:54) or effort (M = 3:47). 


For affect, there was a significant Sex x Field x Attribution interaction, 
Е = 3:01 (4,281), P «0-05. Figure 1 depicts the nature of this three-way 
interaction. Simple effect tests indicated that males judged affect to be significantly 
lower when a business (M = 2-98) in contrast to an athletic(M = 4-23) failure was 
attributed to ability, and females judged affect to be significantly more positive 
when a business (M = 4-52) in contrast to an athletic (M = 3-16) failure was 
attributed to strategy. No other simple effect tests for this three-way interaction 
were significant at the 0-01 level. (It may also be worth noting that the patterning of 
means for this three-way interaction could not be detected in any of the other three 
dependent variables.) 


FIGURE 1 
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A significant Field x Sex interaction for affect also emerged, F = 7:91 
(2,281) P « 0-001. Mean affect as judged by males for education, business, and 
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athletic failure was 3:70, 3-51 and 4:09, respectively; the comparable means for 
females were 3:40, 4:03 and 3:41. The nature of this two-way interaction is depicted 
in Figure 2. 
FIGURE 2 
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Follow-up tests indicated that males judged affect to be significantly lower 
given a business failure in contrast to an athletic failure. The males' judgments of 
affect in the athletic situation was also significantly more favourable than was the 
females’ judgment of affect in this condition. The pattern of the interaction suggests 
that sex differences are more pronounced in the business and athletic conditions — 
fields traditionally associated more with males than females — than in the education 
conditions. The only other significant effect for affect was the Attribution main 
effect, F — 4-98 (2,281), P « 0-01. Simple effect tests revealed that affective 
judgments given strategy attributions (M = 3:98) were significantly better than 
were affective judgments given effort attributions (M — 3-54). Affective judgments 
given ability attributions were of an intermediate level (M = 3:63) and did not 
differ significantly from those made in the other two attribution conditions. 


The generalisability measure produced a significant Field x Sex interaction, 
Е = 6:63 (2,281), Р < 0-01, and an Attribution by Field interaction, Е = 2:48 
(4,281), P « 0-05. The Field x Sex interaction was remarkably similar to the 
Field x Sex interaction reported for the measure of affect and is depicted in Figure 
2. Mean generalisability as judged by females was 3-07, 3:81 and 3:43 for 
education, business, and athletics respectively; no pairwise comparisons of these 
means reached significance. Mean generalisability as judged by males, however, was 
significantly worse in education (M = 3:25) and business (M = 3-09) than in the 
athletic field (M — 4-10). Again, this pattern suggests that sex differences related to 
failure attribution are likely to be greater in a business or athletic setting than in an 
educational setting. The Attribution х Field interaction, which is depicted in 
Figure 3, indicates that the superior effects of strategy over effort and effort over 
ability are more likely to occur in an education or business situation than in an 
athletic situation. Ability attributions for athletic failure produced significantly less 
negative generalisability than did ability attributions for business or educational 
failure. 
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FIGURE 3 
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The only other significant effects for generalisability were the Attribution main 
effect, Е = 6-13 (2,281), P < 0-01 and the Field main effect, Е = 7:45 (2,281), 
P « 0-001. Failure attributed to strategy produced less negative generalisability than 
did failure attributed to effort, and failure in athletics produced less negative 
generalisability than failure in either business or education. 


In summary, failure attributed to strategy generally produced more 
constructive effects than did failure attributed to effort or ability. On only one of 
four measures, namely, future performance, were strategy and effort attributions 


FIGURE 4 
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found to produce similar effects, and even in this instance the means were in the 
predicted direction with strategy (M = 4-53) being more favourable than effort 
(M = 4-22). Figure 4 provides a comparison of the attribution effects for the four 
dependent variables and emphasises the consistency of the pattern produced by all 
four measures. 


DISCUSSION 


The present study offers strong support for the prediction that strategy 
attributions can produce more favourable judgments than effort or ability 
attributions. The present findings are compatible with McNabb’s (1983) finding that 
teachers were more supportive of a student whose failure was attributed to strategy 
in contrast to effort and with Clifford’s (1986) findings that attitude and future 
performance were judged more favourable when the academic failure of a college 
freshman was attributed to strategy rather than effort. 


Available research on the effects of strategy attributions suggest there may be 
great practical value in sensitising teachers and students to strategy attributions and 
to experimentally examining the effects of strategy-attribution training. Such 
attribution-training research might most beneficially be achieved by laboratory and 
field studies which simultaneously examine the effects of learning-strategy training. 


A second major observation is that the effects of attributions on affect or 
feelings appear to be more complex than the effects of attributions on future 
performance, perceived attitude, or support of the failing individual. This, too, is 
consistent with previous research and with the current debates on the affective 
consequences of effort and ability attributions (Brown and Weiner, 1984; Covington 
and Omelich, 1984; Weiner and Brown, 1984). In one or more of the relatively few 
studies designed to examine the effects of strategy as well as effort and/or ability 
attributions, situation, sex, and teacher-student status have all been found to 
moderate the affective effects of attributions. No clear, simple explanation can be 
offered as yet for the interactions which the measure of affect has produced in the 
present study or in earlier studies (e.g., Clifford, 1986). At the same time, it should 
be noted that relatively few main effects or interactions involving the sex factor have 
emerged, and those which have surfaced tend to involve measures of affect and/or 
situations that have traditionally been dominated by males (ї.е., business and 
athletics). 


Data from the present study also suggest that the effects of attribution are likely 
to differ with field or situation. Thus, greater attention should be given to such 
situational factors and their distinguishing characteristics as researchers continue to 
develop and refine attributional models and theories. 


Finally, the present data do not necessarily provide a basis for drawing any 
conclusions about the linearity assumption underlying the stability dimension. They 
do, however, raise questions regarding the theoretical and practical value of this 
assumption. If strategy does not represent an intermediate value on the stability 
continuum, what does and how should it be operationalised? If strategy does 
represent an intermediate value, then these data would suggest that behaviour is, at 
least at times, a curvilinear function of stability. Whether or not strategy can or 
should be conceptualised as representing an intermediate value on this continuum is 
open to further debate. But so too is the value of the linear assumption if, in fact, we 
cannot or do not identify attributions which represent intermediate values of 
stability, and then proceed to confirm or disconfirm the assumption. 
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Copies of the other eight attribution manipulations are available and can be obtained 
from the author. 


Requests for reprints should also be sent to the author: Professor Margaret Clifford, 
College of Education, University of Iowa, Iowa City, 52242. 
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ASSESSING PERCEIVED SELF EFFICACY IN RELATION TO 
MATHEMATICS TASKS: A STUDY OF THE RELIABILITY 
AND VALIDITY OF ASSESSMENT 


By BRAHM NORWICH 


(Department of Child Development and Educational Psychology, University of 
London Institute of Education) 


Summary. This study investigated aspects of the reliability and validity of assessing 
perceived self efficacy in relation to mathematics tasks. Perceived self efficacy was 
assessed in two modes, a direct and generalised mode for two mathematics tasks. Self 
efficacy assessment was conducted over five trials in a repeated measures design with 72 
children, age 9-10 years. Performance accuracy at the tasks was assessed after the first 
and fourth self efficacy assessment. Results showed moderate to high test-retest and 
inter-interviewer reliability for the self efficacy measures. Results also indicated discrimi- 
nant validity for self efficacy judgments for different tasks. There was evidence for the 
association between self efficacy and task performance and mathematics self concept. 
The study confirms some limited aspects of reliability and validity of self efficacy assess- 
ment, but not for the construct in its assumed causal influence on performance 
attainment. 


INTRODUCTION 


Perceiven self efficacy is concerned with “judgments of how well one can execute 
courses of actions required to deal with prospective situations” (Bandura, 1982). 

These specific expectations about ability to perform particular actions are assumed 
to influence whether a person will attempt that action, how persistent he or she will 
be and therefore his or her success at the action, provided there are ‘‘adequate skills 
and appropriate incentives”. Efficacy judgments are inferential and are assumed to 
be subject to information from performance attainments, vicarious experiences, 
verbal persuasion and emotional arousal (Bandura et al., 1977; Bandura, 1977, 
1982). As self efficacy judgments are assumed to have motivational effects they are 
considered to be relevant to children’s academic achievement. In the studies of 
Bandura and Schunk (1981) and Schunk (1981) perceived self efficacy predicted 
mathematics performance to a moderate degree. 


Self efficacy has been assessed by pencil and paper self-report measures. The 
general form of the test involves presenting a hierarchy of tasks or a particular task 
in direct form or specific description. The person is asked whether he can perform 
it/them and the yes/no answer is taken as an index of the level of self efficacy. If the 
person says yes, s/he indicates his or her degree of certainty about this judgment on 
a rating scale: this is taken as an index of self efficacy strength. Borkovec (1978) has 
criticised this form of assessment on the grounds that self reports are open to well 
known kinds of distortions. Kazdin (1978) has noted that, despite the face validity of 
the usual self efficacy assessment procedure, there is a need to investigate the 
reliability and validity of the procedure. Kazdin also points out that the assessment 
method could produce reactive effects on behaviour. Bandura (1978) points out the 
need to minimise the distorting conditions and considers reactive effects unlikely, 
though research findings on the reactive effects are inconclusive (Weinberg et al., 
1980; Gauthier and Ladouceur, 1981; Lee, 1984). 


One of the aims of this study was to investigate the reliability of the currently 
used method of assessing self efficacy in relation to an academic task. Of particular 
interest was whether familiar and unfamiliar administrators would elicit consistent 
self efficacy judgments and whether there was consistency of assessment over a short 
period of time (2-3 minutes), once subjects were familiar with the task. The 
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particular academic area selected was mathematics as this is an area in which moti- 
vational processes are considered to be of particular relevance (Dweck and Licht, 
1980). 


If, as Kazdin recommends, construct validation is needed, then it is of interest 
whether a slight change in the assessment operations would result in different self 
efficacy level and strength. Two methods of assessment can be used. In the direct 
method, the task is present, e.g., three instances of a mathematics task, when the 
questions are asked. In the generalised method, the same questions could be asked 
once the items have been removed and reference is made to other items of that kind. 
A further dimension of self efficacy, specificity-generality of the judgment in rela- 
tion to different kinds of tasks, is also relevant to the validation of the self efficacy 
construct. Self efficacy judgments are assumed to be behaviour or task specific. It 
would be expected that self efficacy judgments for a particular task assessed by the 
direct method will be more related to judgments for that task using the generalised 
method than judgments for a different but related task. A further aim of the study 
was to examine this aspect of the discriminant validity of the self efficacy assessment 
method. р 


From self efficacy theory it is expected that prior performance attainments 
amongst other factors will influence future self efficacy. It would be expected that 
this influence could occur both by the person observing his/her attainment without 
feedback and by the person taking account of feedback from an outside source. 
Another aim of this study was to investigate whether self efficacy judgments were 
responsive to these different sources. Positive findings would add to the validation 
of the self efficacy construct. However, self efficacy theory does not elaborate in 
much detail on the causal relationships between self efficacy and other factors. 
Despite Bandura’s emphasis on reciprocal causality and a framework which involves 
‘within person” factors, self efficacy has not been linked to self concept theories. 
Bandura (1977) has criticised the use of global indices of self conceptions but not 
addressed the possible relationships between self efficacy and certain elements of a 
multifaceted and hierarchical self concept theory (Shavelson et al., 1980). Without a 
commitment to a hierarchical model it is possible to make assumptions about one 
facet of self conceptions at a less general level of analysis, e.g., self judgments of 
mathematics ability. If it is assumed that these self judgments are relatively stable, 
but open to internal and external influence and dependent on what counts as mathe- 
matics, then it can be hypothesised that they will be related to task specific self 
efficacy. From an interactionist perspective it can be expected that maths self 
concept and task specific self efficacy will exert a mutual influence on each other. A 
further aim of this study was therefore to investigate whether mathematics self 
concept is related to task specific self efficacy and to accuracy of self efficacy (the 
match between self efficacy and performance). 


In summary, the study aims to investigate aspects of the reliability and validity 
of assessing self efficacy, in particular, very short-term retest reliability and 
reliability between a familiar and an unfamiliar administrator. Validity of the self 
efficacy construct is investigated in terms of (i) the discriminant validity between self 
efficacy judgments in relation to different tasks, (ii) the responsiveness of self 
efficacy judgments to prior attainments and feedback about accuracy on relevant 
tasks, and (iii) the relationship between self efficacy and accuracy of self efficacy 
and mathematics self concept. 


METHOD 
Subiects 
Subjects were 38 boys and 34 girls from four different mixed sex schools in the 
inner London area comprising four complete classes. They were children in 3rd year 
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junior classes, between 9 and 10 years old. Two of the schools were in more advan- 
taged socio-economic areas and the other two in less advantaged areas. 


Design 

Self efficacy was assessed in relation to two related mathematics tasks using the 
two methods of assessment, direct and generalised methods. This four-part assess- 
ment was conducted for five repeated trials with about 15 seconds between trials. All 
trials were conducted by the author, except trial 3 which was conducted by the class 
teacher. After the first trial each of the subjects attempted the two tasks without any 
subsequent feedback about accuracy, and after the fourth trial the subjects also 
attempted the task but this time with subsequent accuracy feedback. 


Assessment measures 
Self efficacy 

Self efficacy level was assessed by asking children whether they could or could 
not answer a particular kind of mathematics question. The yes or no answer indi- 
cated self efficacy level. If the answer was yes then a certainty rating on an 11-point 
rating scale was requested, indicating self efficacy strength. Initially, children were 
given practice in the routine for about five minutes using a very difficult and then a 
very easy mathematics question. 


The following was read out to the children for the direct assessment method: 


When I tell you to turn the page you will see three mathematics questions of 
the same type. Look at them and, without trying to answer them, ask yourself if 
you think you can do questions of this type. 


Turn the page and look at the questions. 
(after five seconds). 


Turn the page again. Now tick one of the boxes to show whether you think 
you can or cannot answer questions of this type correctly. 


Only if you think you can get them correct, show how certain you are by 
circling one of the numbers from 0 to 10 along the number line. 


The following was read out for the generalised assessment method: 
Turn over the page. 


Try to think about the kind of questions you have seen on the previous 
page. Imagine other questions of the same type. Ask yourself if you think that 
you can do other questions of this type. Tick one of the boxes to show whether 
you think you can or cannot. 


Only if you think you can get other questions of this type correct, show 
how certain you are by circling one of the numbers along the number line. 


Three examples of the question type were shown to the children for five seconds 
before the efficacy questions were answered. Exposure for five seconds allowed 
them to assess the difficulty but not to solve and write down the answers. For the 
performance trials two items of the question type were used. For each self efficacy 
assessment trial and for both performance trials different instances of the question 
type were used. 


Mathematics tasks 
Each class was working at different levels and topics so pairs of related 
questions of moderate difficulty were selected for each class. The first question of 
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each pair required a basic skill with the second question as a related task. Examples 
of the four pairs of questions were as follows: 


Basic Related 
1.75 x Tom worked for 4 hours. Each hour he 
_5 earned 39 pence. How much did he earn? 
2. 75 +5 = Jane spent 56 pence оп 7 pens. How much 
did each pen cost? 
3.67 = What number multiplied by itself gives 49? 
4, Fill in the missing number: Write these in order starting with the 
smallest: 
45 tenths = 5:7, 5:1, 2-0, 0-9, 3-0 


Mathematics self concept 

A direct measure of self judgment of mathematics ability was constructed prior 
to this study, based on a similar inventory for academic self image (Barker Lunn, 
1970). Seven statements of the form ‘‘I’m useless at mathematics" and ‘‘I’m very 
good at mathematics” were used. Responses were yes, not sure and no, producing a 
scale from 0 to 14. Aspects of reliability and validity were tested on four classes of 
4th year juniors aged 10-11 years in the same school (N — 73). Retest reliability over 
а one week period wasr = 0:72 (Р < 0-01) with mean scores at time 1 and time 2 of 
9-8 and 10-0 respectively. Cronbach о was estimated at 0:80 for the scale. There 
were moderate correlations with general mathematics attainment — r's varying from 
0-30 to 0-48 (P « 0-01). 


RESULTS 

Reliability 

The three middle trials of the five efficacy trials were used for estimating the 
retest and inter-assessor reliabilities. The first trial was not used because the children 
were not yet familiar with the task. Trials 2 and 4 conducted by the author were used 
for the retest estimate and trials 2 and 3 (author followed by teacher) and trials 3 and 
4 (teacher followed by author) were used for the inter-assessor reliability. Retest 
reliabilities over a 3-minute period were in the medium to high range (phi 
coefficients: 0-79, 0-72, 0:64, 0:79: P < 0-01) for both tasks with both methods of 
assessment. Similar levels of retest reliability were found for self efficacy strength 
measures on all four comparisons (г = 0:79, 0:75, 0-76, 0:87: P < 0-01). 


Considering the inter-assessor reliabilities, for self efficacy level, the phi 
coefficient ranged from 0:57 to 0-94, median = 0-85, and similar levels of 
reliability were found for self efficacy strength, median r — 0-85. The self efficacy 
certainty levels in the retest trials for both tasks were not significantly different from 
trial 2 to trial 4 (t = 0:7, df = 71, P < 0-05; t = 1:2, df = 71, P < 0-05). For the 
inter-assessor trials there were higher levels of certainty when assessed by the class 
teacher but these were only significant for the related tasks (t = 1:9, df = 71, 
P = 0-06 for trial 2 compared to trial 3, and = 2:9, df = 71, P < 0:05 for trial 3 
compared to trial 4). 


Discriminant validity 

Discriminant validity, which was investigated using the estimate of self efficacy 
strength, involved comparing the correlation between the direct and generalised 
assessment certainty ratings for each task (two different measures of the same 
characteristic) with the correlation between certainty ratings for the basic and 
related tasks (same measure of related but different characteristic). The former 
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correlations were expected to be higher than the latter. These were done for trials 2, 
3 and 4 only. Of the 24 such comparisons made, all were in the expected direction to 
support the discrimination between the assessment of self efficacy strength for the 
basic and related tasks. The comparison of coefficients ranged from r = 0:77 >r = 
0:70 tor = 0-91 >r = 0°51 (all coefficients were significant at the 0-01 level). 


Self efficacy and prior performance 


TABLE 1 


MEAN SELF-EFFICACY CERTAINTY RATINGS BEFORE AND AFTER PERFORMANCE TRIAL WITHOUT 
PERFORMANCE ACCURACY FEEDBACK (STANDARD DEVIATIONS BRACKETED) 








Basic task Performance level F value 
0 1 2 
Self-efficacy certainty: Performance: Е = 5-4 (df = 2,69) P < 0:01 
Before performance 5:3 6-5 
3-5) Self-efficacy: Е = 2-5 (df = 1,69) P > 0:05 
After performance 5:9 7:9 
4:7) Interaction: Е = 2:0 (df = 2,69) P > 0:05 
5:7 5:6 8:1 7:2 
М= 20 N=7 N = 45 N = 72 
Correlation coefficients: Performance level and self-efficacy (before) = 0:17Рр>0:05 
Performance level and self-efficacy (after) = 0-44P «0-05 
Related task Performance level F value 
0 1 2 
Self-efficacy certainty: Performance: F =13-0 (df = 2,69) Р < 0.01 
Before performance 6°5 8:1 9.5 7:9 
(4-1) (2:7) (1:1) Self-efficacy; Е = 1-3 (df = 1,69) P > 0-05 
After performance 5:8 9-6 9.9 8-0 
(4-6) (0:7) (0-4) Interaction: F = 3:2 (df = 2,69) P < 0:05 
6-1 8-8 9.7 8-0 


N = 32 N-212 N = 28 
= 0-42P «0-05 
Performance level and self-efficacy (after) = 0:51Рр<0: 


It was found that for the first task there was по significant correlation between 
self efficacy and subsequent performance level. After task performance without 
accuracy feedback the performance level correlated significantly with the subse- 
quent self efficacy at the 0-44 level. Although two-way ANOVA 
(Performance x Trials) showed only significant variation in self efficacy for per- 
formance levels, there was a non-significant trend for those subjects who were more 
accurate in performance to show greater increases in self efficacy: 0-1, 0-6 and 2:1 
point increases with performance accuracy. For the related task on performance 
trial 1, there was a significant interaction between performance accuracy and change 
in self efficacy. Those subjects who failed to get either question correct showed a 0-7 
decrease, whereas those with one and both correct showed a 1:5 and 0:4 point 
increase in self efficacy certainty respectively. 


After the second basic task performance trial which was followed by accuracy 
feedback, the correlation between the two variables decreased. This is associated 
with an unexpected increase in self efficacy for nine subjects who failed to get either 
question correct, from 5-4 to 7-1. After the performance trial on the related task 
there was an increase in correlation, which is associated with greater variation in self 
efficacy after the trial. Although there was no significant interaction between self 
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TABLE 2 


MEAN SELF-EFFICACY CERTAINTY RATINGS BEFORE AND AFTER PERFORMANCE TRIAL WITH 
PERFORMANCE ACCURACY FEEDBACK (STANDARD DEVIATIONS BRACKETED) 






Basic task Performance level F value 
1 2 
Self-efficacy certainty: Performance: F (2,69) = 8-9 P 0:01 
Веѓоге регѓогтапсе 5° 5-4 8:9 7:7 
4- 4:5) 2:5) Self-efficacy: Е (1,69) = 4:6 0:05 
After performance 7: 6-1 9-1 8:2 
3: 4-6) 2:5) Interaction: Е (2,69) = 1:0 Р > 0-05 
6:3 5:8 9:0 8-0 
N29 М№М= 16 N= 47 
Correlation coefficients: Performance level and self-efficacy (before) = 0-41 P <0-05 
Performance level and self-efficacy (after) = 0-31 P<0-05 
Related task Performance level F value 
0 1 2 
Self-efficacy certainty: Performance: F (2,69) = 13:7 P < 0-01 
Before performance 3-9 6-8 8:6 
(4:4) (4:3) (3:3) 7:6 Self-efficacy: Е (1,69) = 2:4 Р > 0:05 
After performance 3-0 8:8 8:7 
(3:5) (1-8) (3-0) 8:0 Interaction: Е (2,69) = 0-8 Р > 0:05 
3-5 7:8 8:7 
М№= 10 М№М= 12 N = 50 
Correlation coefficients: Performance level and self-efficacy (before) = 0-4 P«0-05 
Performance level and self-efficacy (after) = 0-67P «0-05 





efficacy change and related task performance as in performance trial 1, there was a 
similar trend. Efficacy certainty decreased slightly for those getting no questions 
correct. 


Mathematics self concept in relation to self efficacy 

Three-way ANOVA — Self concept groups x Task x Self efficacy trials (1 
between, 2 within factors) — showed significant main effects for self concept group 
and for trials, and an interaction between these two factors (F = 2-1, df = 8,272, 
P « 0-05), but no significant main effect for the task factor, (F — 3:5, df — 1,68, 
P > 0-05). Figures 1 and 2 show these patterns of self efficacy and in particular the 
relative fluctuations in self efficacy for the low mathematics self concept groups. 
For both tasks there were relative increases in self efficacy certainty when the class 
teacher conducted the assessment and at the end of the five-trial series. In the case of 
the class teacher assessment trial, only the decrease from trial 3 to 4 for the related 
task was significant (t = 2:4, df = 13, P < 0-05). At the end of the five trials only 
the increase for the basic task was significant (t = 2:2, df = 13, P « 0-05). The 
results are shown in Figures 1 and 2. 


Mathematics self concept and accuracy of self efficacy judgments 

Performance on the task was categorised into: high, with two questions correct: 
medium, with one correct: low, with none correct. Certainty ratings were also cate- 
gorised into: high, 8-10: medium, 4-7: and low, below 4. Subjects whose perfor- 
mance levels matched their certainty ratings were classified as having accurate self 
efficacy. Subjects whose performance level was below their certainty level were 
classified as over-estimators. Those with performance levels above their certainty 
level were classified as under-estimators. Figure 3 shows the relationships both 
before and after the first performance trial. 
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FIGURE 1 i 
SELF-EFFICACY CERTAINTY RATINGS OVER THE 5 TRIALS FOR THE 3 MATHS SELF-CONCEPT GROUPS 
BASIC TASK 

: ee US еи 

9 

8 

Medium Self Concept 

7 

6 Low Self Concept 

5 

4 

1 2 3 4 5 
Tnais 
FIGURE 2 
SELF-EFFICACY CERTAINTY RATINGS OVER THE 5 TRIALS FOR THE 3 MATHS SELF-CONCEPT GROUPS 
RELATED TASK 
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FIGURE 3 


PERCENTAGE ACCURACY OF SELF-EFFICACY BEFORE AND AFTER IST PERFORMANCE TRIAL FOR DIFFERENT 
MATHS SELF-CONCEPT GROUPS 


before tral 
after trial 





Low Medium High | SELF-CONCEPT GROUPS 
n»22 п = 23 n= 27 
scores 0-5 69 10-14 


There was a significant association between the frequency of accuracy before per- 
formance and higher self concept level (x? = 10-3, df = 2, P < 0:01). After the 
first performance trial there were significantly more accurate self efficacy ratings 
(McNemar’s X? = 9, df = 1, P < 0-05), and still a significant association between 
accuracy and self concept level (x? = 6:8, df = 2, P < 0-05). 


FIGURE 4 
PERCENTAGE ACCURACY OF SELF-EFFICACY BEFORE AND AFTER 2ND PERFORMANCE TRIAL FOR DIFFERENT 


SELF-CONCEPT GROUPS 
before 
after 
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The tendency for accuracy before performance trial 2 to be associated with 
higher self concept level was not significant (x? = 5-0, df = 2, P > 0-05). After 
this trial there was no increase in accurate self efficacy judgments (McNemar’s 
X? = 0:3, df = 1, P > 0-05). However, there was still a significant association 
after this trial between accuracy and self concept level (x? = 7.8, df = 2, 
Р « 0-05), representing a difference in accuracy between the low and medium self 
concept groups. 


DISCUSSION 


The medium to high reliability found in this study relates only to conditions in 
which subjects are familiar with the task involved in self efficacy assessment. The 
degree of retest reliability over this short period for a specific construct corresponds 
to that found over a one-week period for the more general construct of maths self 
concept. The results also indicate overall a similar degree of reliability in assessing 
self efficacy level and strength by a familiar and unfamiliar assessor. However, there 
was a Slight but not always significant increase in self efficacy strength when 
assessed by the familiar class teacher. The increased strength was most apparent for 
the children with low mathematics self concepts. This suggests that there might be 
relevant child characteristics which affect the consistency of assessment between a 
familiar and unfamiliar assessor. 


As regards validity, there is evidence that measures of self efficacy strength for 
different tasks can be differentiated from one another. This supports discriminant 
validity of this assessment method for task specific self efficacy. Differential change 
in self efficacy — in response to different mathematics attainment — was found in 
both sets of responses to performance on the related task, but was significant only 
after the first performance. This is consistent with the interpretation that familiarity 
with performing the task provides as much information for self efficacy judgments 
as performance and accuracy feedback, though these results may be due to the 
sequence of no feedback — feedback in this design. 


Results for the basic task were more difficult to interpret. On the first trial there 
was a non-significant trend for those with higher levels of mathematics attainment 
to show greater increases in self efficacy. On the second trial of the basic task, 
children with no questions correct showed the largest increase in certainty ratings, 
whereas those with both correct were so certain that there was little scope for more 
certainty. This reversal of self efficacy change cannot be attributed simply to this 
ceiling effect as children with one question correct did not increase their self efficacy 
more than those with none correct. This pattern could be interpreted in terms of the 
tendency of those with low maths self concept — probably the lowest attainers on 
this task — to have unrealistic self efficacy judgments. This tendency might be 
found when children have experienced a series of performance difficulties as in these 
trials. 


Findings are also consistent with the hypothesised relationship between general 
self judgments of mathematics ability — mathematics self concept — and specific 
self judgments of capability — self efficacy. Over the series of trials children with 
higher maths self concept had consistently higher self efficacy certainty ratings. In 
all cases the low self concept group showed some significant increases in certainty 
ratings when assessed by someone familiar to them. This group also showed some 
significant increase for the last self efficacy assessment. In one case-this rose to a 
level above that for children with medium mathematics self concept. Reference has 
already been made to children with low scores on mathematics self concept. What 
these patterns indicate is that these children might show not only lower but more 
variable self efficacy strength. This interpretation is consistent with another finding, 
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that children with low mathematics self concept are more likely to have more 
inaccurate self efficacy judgments. 


A tentative conclusion to this study is that the method of self efficacy 
assessment under particular conditions is as reliable as the best traditional measures 
in the attidudinal field. In connection with the validation of the self efficacy 
construct, there was evidence of correlation between self efficacy and task 
performance and mathematics self concept. In some trials differential performance 
was followed by self efficacy changes in the expected directions, and discriminant 
validity was found for measures of different self efficacy judgments. This evidence 
is supportive of those aspects of validity considered. However, caution needs to be 
exercised as validation is a cumulative process which is influenced by theoretical 
assumptions about the construct and its relationships. The theoretical relationships 
between self efficacy and performance which take account of other variables, e.g., 
general self conceptions and prior measures of performance and self efficacy, and 
the presumed causal directions have not been tested in this analysis. These findings 
should not be taken, therefore, to validate the assessment method as a measure of 
self efficacy in its assumed causal influence on performance. 


Requests for reprints should be sent to Dr. Brahm Norwich, Department of Child 
Development and Educational Psychology, University of London Institute of Education, 
24-27 Woburn Square, London, МСІН ОАА. 
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EXPECTANCY AND MOTIVATION IN REAL LIFE 
ACHIEVEMENT SITUATIONS 


By FRED VOLLMER 
(University of Bergen, Norway) 


Summary. The aim of the present study was to find out to what extent expectancy might 
determine the amount of independent thinking manifested in an academic examination 
situation, and whether the latter variable was an independent determinant of grades. Path 
analysis showed that expectancy did have an indirect effect on grades through indepen- 
dent thinking, but this effect was not very strong. The hypothesis that expectancy 
measured shortly before an examination is a realistic estimate of pre-examination 
knowledge level, and that the expectancy/grade connection may be accounted for by 
assuming a relation between pre-examination and examination knowledge levels, found 
strong support. 


INTRODUCTION 


In numerous laboratory studies empirical relationships have been reported between 
measures of expectancy and motivation (see, e.g., Nygård, 1977, Feather, 1982). 
The motivational effect of expectancy in real life achievement situations like aca- 
demic examinations, however, has not been extensively explored. While many 
previous studies have reported significant correlations between expectancy and 
grades on university examinations (e.g., Crandall, 1969; Keefer, 1969; Morrison et 
al., 1973; Holen and Newhouse, 1976; Holahan and Kelly, 1978; Kovenklioglu and 
Greenhaus, 1978; Bernstein et al., 1979; Malloch and Michael, 1981; Holahan et al., 
1982; Kimball and Gray, 1982), it is still not clear to what extent this connection is 
mediated by motivation. In a previous study (Vollmer, 1984), expectancy measured 
shortly before an examination was found to correlate with a measure of persistence 
in the examination situation. The latter variable also correlated with grades, indi- 
cating that the relationship between expectancy and subsequent performance at least 
in part might be due to differences in effort expenditure between students high and 
low on expectancy. These results, however, were only obtained in a male group, and 
did not hold for women. Moreover, a replication study (Vollmer, 1986) using a 
similar group of students and measuring instruments, failed to find any relationship 
between expectancy and effort expenditure in the examination situation, both for 
females and males. 


An academic examination, however, at least the type requiring students to write 
essays, is a complex achievement situation. It consists in the performance of 
different types of activities, e.g., producing information, organising it into a struc- 
tured whole, comparing various materials, critically analysing and discussing con- 
cepts, theories, experiments, making evaluations, drawing conclusions, etc. In 
achievement situations of this type it is therefore possible that expectancy may affect 
some specific aspects of behaviour rather than the overall and general quantity of 
expended effort. That is, while persons with high expectancy may not spend more 
total time on their examinations papers, or produce a higher total amount of words 
(or some other gross persistence measure), than persons with low expectancy, there 
may be differences between the two groups in amount of some specific type of 
activity. 

What type of activity in an academic examination situation, then, may be 
related to expectancy? A clue to this problem may be obtained by examining what 
kind of variable expectancy is. 
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According to attributional theories of motivation (Kukla, 1972; Meyer, 1973; 
Meyer and Hallermann, 1977; Weiner, 1980), expectancy is an expression of a stable 
personality dimension, namely perceived ability, and significant correlations 
between expected grades on university examinations and measures of perceived 
ability have been reported by Holahan and Kelly (1978), Motowidlo (1981), and 
Vollmer (1986). 


What kinds of activities in the examination situation are most likely to be 
influenced by people’s expectancy as an expression of their perceived ability? The 
various activities a person must/should engage in when writing an examination 
paper can be sorted into two broad classes. One thing a person obviously should do 
is demonstrate how much he or she knows. This is mainly done by reporting, as 
correctly and in as much detail as possible, viewpoints, theories, and facts contained 
in the curriculum. The other thing a person can do, in addition to describing facts, 
theories, and concepts, is to critically discuss and evaluate them. While the first kind 
of activity may seem rather safe, requiring only that the person knows, understands, 
and can report what others have said, the second type may seem more challenging, 
but also dangerous, in that persons have to do some thinking of their own, rely on 
their own judgments, assert their own viewpoints. It seems reasonable to assume, in 
turn, that persons who feel uncertain about their own ability (have low self- 
confidence) will be less willing to engage in such independent thinking than persons 
with high perceived ability. 


The main aim of the present study, then, will be to examine to what extent 
expectancy, as an expression of perceived ability, is related to amount of inde- 
pendent thinking manifested in the examination situation, and to what extent the 
latter variable is a determinant of grade. 


In a previous study (Vollmer, 1986) expectancy was also found to correlate with 
amount of work expended in preparing for an examination. The reason why 
students expect to do well if they have worked hard probably is that past work is 
believed to determine level of knowledge, and the latter variable to influence grades. 
In other words, expectancy measured shortly before an examination may be a 
subjective estimate of pre-examination knowledge level. This interpretation is not in 
conflict with the finding that expectancy is also an expression of a more stable 
personality trait like perceived ability. For students who have worked equally hard 
in preparing for an examination may have different conceptions of how much they 
have learned and know. These differences, in turn, may be due to individual 
differences in perceived ability (Kukla, 1972; Meyer, 1973). 


This interpretation does, however, suggest a way of understanding the relation- 
ship between expectancy and grades which does not necessarily imply any moti- 
vational links. If an important determinant of grades is the amount of factual 
knowledge shown in the examination paper, and this latter dimension depends on 
how much the student has learned and knows shortly before the examination, expec- 
tancy may relate to later grades simply because expectancy is a realistic estimate of 
actual pre-examination knowledge level. According to this interpretation, all the 
expectancy-performance relationship may indicate is that there is some kind of a 
relation between people’s pre-examination knowledge levels (of which expectancy is 
an indicator) and the amount of information they are able to produce in the 
examination situation (of which grade is an indicator). 


The latter hypothesis seems simple and highly plausible a priori compared to the 
theory that expectancy determines the amount of independent thinking people are 
willing to engage in, and thereby what grades they obtain. It would seem wise, 
consequently, to find out how much of the relationship between expectancy and 
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grades can be accounted for by knowledge level. It can then be ascertained whether 
independent thinking has any independent effect on grade, in addition to the effect 
of knowledge level, and to what extent expectancy determines grade through the 
variable of independent thinking when knowledge level is controlled. 


METHOD 


Subjects were 117 students, 77 female and 40 male, taking an undergraduate 
psychology examination at the University of Bergen. 


Preparation work 

Students were asked to indicate how many semesters full time work they had 
spent in preparing for the examination. A seven-point scale was used ranging from 
** semester" to ‘‘more than 3 semesters”. Students were also asked: “Ном many 
hours per week on the average do you spend studying? This question may be 
difficult to answer because how much one works may vary from day to day and 
from week to week. Try, however, to describe a typical week’’. As an overall 
estimate of how much work a person had invested in studying for the examination, 
number of hours per week was multiplied with numbers of semesters. 


Perceived ability 

A self confidence scale (Vollmer, 1986) was used to measure perceived ability. 
The test consists of seven items all assumed to relate to a person’s perceived ability in 
connection with academic examinations. High scores indicate high perceived ability. 


Expectancy 

Expectancy was evaluated by asking students one week before examination to 
‘estimate as realistically as possible the grade (1 -0-4-0) you in fact think you will get 
on the examination’’. The scale was reversed for correlational analyses so that high 
numbers meant high expectancy. 


Knowledge and independent thinking 

To pass the examination students had to write two separate papers (on two 
different days). Each paper was evaluated by three teachers, and final grade was 
fixed on the basis of each teacher’s evaluations of both papers. In connection with 
the present examination, teachers were also asked to evaluate both papers of each 
student as to amount of knowledge and independent thinking on five-point rating 
scales. In this way six different ratings of knowledge and independent thinking 
manifested in the examination situation were available for each student. | 


Examination grades 

Grades on the psychology examination range from 1-0 (best possible) to 4-0. 
For correlational analysis the scale was reversed so that a high number meant good 
performance. 


RESULTS 
Alpha coefficients of reliability for the seven-item self confidence scale, and the 
six indicators of knowledge and of independent thinking were 0:77, 0°87, and 0-85 
respectively. 
The two hypotheses as to how the expectancy-performance relationship can be 
explained were represented in a path diagram (Figure 1). 
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FIGURE 1 


HYPOTHETICAL RELATIONSHIPS BETWEEN PREPARATION WORK, PERCEIVED ABILITY, EXPECTANCY, 
KNOWLEDGE, INDEPENDENT THINKING AND GRADE, WITH PATH COEFFICIENTS 





The assumptions that preparation work (X,) and perceived ability (X,) are 
determinants of expectancy (Y,), are represented by one-way paths from X, and X, 
to Y,. As no hypotheses were formulated regarding causal relationships between 
work and perceived ability, X, and X, are connected by a bi-directional curved 
arrow. The hypothesis that expectancy, as an indicator of knowledge level right 
before the examination, determines amount of knowledge manifested in the 
examination situation (Y2), and thereby grade (Y4), is represented by the path from 
Y, to Y, to Y,. The hypothesis that expectancy is a determinant of independent 
thinking in the examination situation (Y,), and thereby of grade, is shown by the 
path from Y, to Y, to Y,. 


As to the relationship between knowledge and independent thinking, it could be 
argued that conceptually the two dimensions may be relatively independent. A 
person can produce much information in a paper and yet do very little in the way of 
discussing, analysing, and posing questions (i.e., independent thinking). And vice 
versa, a person can produce comparatively little information but still analyse and 
discuss problems extensively. As measured, however, several factors are in 
operation that can be expected to produce a correlation between the variables. Pairs 
: of Y, and Y, estimates are produced by the same evaluator, at the same time, using 
the same rating scale, probably after having formed a global impression of the paper 
which may subsequently influence both ratings. The possibility of a correlation 
between Y, and Y, due to constant rating error, or halo effect (cf. Guilford, 1954; 
Kerlinger, 1973), has been represented in the path diagram by allowing the distur- 
bance terms (12 and $3) of Y, and Y, to correlate. 


Data for the model were correlations. Path coefficients were estimated by use 
of the maximum likelihood method of LISREL modelling (Jóreskog and Sorbom, 
1981), and are shown in Figure 1. 
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The model’s over-all fit to the data was quite good as shown by a small chi- 
square (10-04 df = 7, P = 0-187) and mean residual (0-035). 


All path coefficients in the model, except the value for the bi-directional path 
between preparation work and perceived ability, were significant, as shown by 
t-tests. Knowledge and independent thinking manifested in the examination paper 
both contributed independently in determining grade, but knowledge was a much 
better predictor than independent thinking. The total effect of expectancy on grade 
through the knowledge and independent thinking variables was 0-413, leaving a 
residual variance of only 0-047, and indicating that the relationship between expec- 
tancy and grade could be adequately explained by the variables knowledge and inde- 
pendent thinking. The effect of expectancy on grades through knowledge (0-297) 
was, however, much stronger than the effect through independent thinking (0-11). 
Finally, preparation work and perceived ability were found to be independent deter- 
minants of expectancy. 


The model was also tested separately for women and men. In both sex groups 
the fit between model and data was quite good. Chi-squares for women and men 
were 7:41 and 12-08 (both non-significant), and average residuals 0-032 and 0-059. 


DISCUSSION 


According to traditional theories of achievement motivation (e.g., Kukla, 1972; 
Meyer, 1973; Atkinson, 1974), being more or less motivated to act consists in the 
differential mobilisation of energy to perform the activity, or in expending more or 
less effort on the task, and is typically thought to be expressed in such measures of 
persistence as number of responses, or time spent on task performance. The central 
idea in the present paper was that on complex tasks like examinations the moti- 
vational effect of expectancy might not primarily be one of driving people generally 
to expend more or less effort, but rather one of providing them with more or less 
courage to do some thinking of their own. Such a connection seemed plausible in 
view of the assumption that expectancy is an expression of a person’s perceived 
ability (or self-confidence). The data supported these assumptions in that significant 
relationships were found between perceived ability and expectancy, on the one hand, 
and between expectancy and independent thinking on the other. 


Independent thinking, however, was not found to be a very strong determinant 
of grade when knowledge was controlled. The explanation for this can probably be 
found in the high correlations between independent thinking and knowledge, and 
between the latter variable and grade. With knowledge partialled out, the 
**'jndependent'' effect of independent thinking became small. The basic source of 
difficulty probably is that knowledge and independent thinking were not measured 
independently. Global ratings of both dimensions were given on simple five-point 
scales by the same rater, probably under the common influence of a conception of 
overall grade. An important task for future research in this area is consequently to 
develop more objective ways of measuring independent thinking. 


The assumption that students with high expectancy get better grades than those 
with low expectancy, because the former persons have worked harder in preparing 
for the examination and therefore have learned more than the latter, also found 
support. Significant relationships were found between preparation work and expec- 
tancy, and between expectancy and knowledge level manifested in the examination 
situation. The latter variable, in turn, proved to be strongly related to grade. 


It should be pointed out, however, that the finding that expectancy predicts 
knowledge level as manifested in the examination situation, and thereby grades, may 
also be given a motivational interpretation. Having worked hard in preparing for an 
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examination and objectively knowing much before the examination, does not by 
itself lead to the production of a high amount of information in the examination 
situation. People who have worked equally hard, and have the same levels of 
knowledge, may not manage or be motivated to activate, use, and demonstrate their 
knowledge to equal degrees. Here it is conceivable that high expectancy (believing 
one knows a lot) is encouraging and enables people to be productive, whereas low 
expectancy is discouraging and hinders people in trying to produce knowledge. 
Though it is reasonable to assume that expectancy may have such a motivational 
effect on knowledge production, it is also important to stress that this effect must be 
limited. High expectancy cannot make people produce knowledge they do not have. 
Determining how much of the relationship between expectancy and amount of 
knowledge manifested in the examination situation may be due to motivational 
factors, is, however, beyond the scope of the present study. 


In conclusion, results of the present study support both the notion that expec- 
tancy may be an indicator of pre-examination knowledge level and therefore relate 
to grade, and the hypothesis that expectancy may have motivational consequences in 
the examination situation and thereby determine performance outcome. An impor- 
tant theoretical implication of the present results is that in complex achievement 
situations like examinations, the motivational consequences of expectancy may 
consist in doing more or less of certain types of activity like independent thinking, 
rather than in general effort expenditure or persistence. 


Requests for reprints should be sent to Dr. Fred Vollmer, Department of Personality 
Psychology, University of Bergen, Sydnesplass 9, N5000 Bergen, Norway. 
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THE BOEHM TEST OF BASIC CONCEPTS: AN ENGLISH 
STANDARDISATION 


By EILEEN F. SMITH 


SUMMARY. The Boehm Test of Basic Concepts (Form A) was standardised on 1,928 children in 
England. The test directions were modified for British use. Raw scores were converted to standard 
scores by age level in half-years from 3:6 to 7:6. An equivalence study found the two alternate forms 
(Forms A and B) to yield roughly equivalent scores. Relationships were found between BTBC scores 
and both social class and type of domiciliary area. 


INTRODUCTION 
The Boehm Test of Basic Concepts (BTBC) was designed to measure children’s mastery, 
of concepts considered necessary for achievement in the first years of school (Boehm, 1971). 
Boehm queried the assumption that children have mastered the basic concepts necessary for 
understanding and following directions by the time of school entry. 


Use of the BTBC in the UK has been limited by the lack of British norms. The BTBC 
manual provides only percentile equivalents of raw scores by US school grades and 
socio-economic level for ‘beginning-of-year’ and ‘mid-year’ testing. 


The BTBC directions are straightforward, the scoring is unambiguous, and adminis- 
tration requires no special training. In this 50-item test, respondents are required to indicate 
which of several objects pittured is ‘in the middle’, is ‘half gone’ etc. The test is quick to 
administer and the children enjoy it (Dahl, 1973). The major weaknesses, according to the 
reviewers, relate to lack of validity evidence (Freeman, 1972; Lawlor, 1972; McCandless, 
1972; Noll, 1972; Proger, 1972; Dahl, 1973). 


This paper will provide details of a procedure designed to produce standardisation data 
on the use of the Boehm Test with English primary school children. The issue of validity will 
be taken up in a later paper. 


METHOD 
Sampling 
A one-in-seven sample of schools was drawn at random from one non-metropolitan and 
two metropolitan areas. The distribution of the 78 schools across rural, inner city and ‘other 
urban’ areas was not significantly different (chi square 0-139; df 2: P > 0-05) from that found 
by HMI (DES, 1978). 


The sample of children (N = 1928), aged 3% to 7: years, was drawn by birth-dates so 
that each child was tested within a month of his birthday or his birthday plus six months. 
Evidence from the British Ability Scales sampling population (Elliott, 1983) confirms the 
opein that this method produces a representative cross-section of socio-economic 
groupings and ability levels. 


According to whichever parent's occupation ranked highest each child was classified by 
socio-economic group as follows: Group 1: professional (including nurses and teachers) and 
managerial (i.e., the Registrar-General's classes I and II); Group 2: other ‘white collar’ and 
skilled manual (i.e., the Registrar-General’s class IID; Group 3: serm-skilled and unskilled 
manual (i.e., the Registrar-General's classes IV and V) and parents in long-term unemploy- 
ment; Group 4: known to be manual workers but the precise nature of the job unknown i.e. 
groups 2 or 3; Group 5: known to be non-manual but insufficient information to distinguish 
between groups ] and 2; Group 6: occupation unknown. The distribution of SES groups 
classified according to parental occupation or occupation before recent unemployment was 
Group 1, 16:9 per cent, Group 2, 37:4 per cent, Group 3, 23-4 per cent, Group 4, 7:2 per 
cent, Group 5, 1:0 per cent, Group 6, 14-2 per cent. 
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Procedures 

The BTEC instructions (Forms A and B) were modified for UK children after Form A 
modifications had been tested on 300 children. Each child in the sample was tested 
individually with the adapted version of Form А (N = 1,928). A subsample of 144 children 
from 14 schools was tested with both parallel forms: half of each age group (4:0, 5:0, 6:0, 7:0) 
were tested first with Form A and half with Form B; within several days each child was tested 
with the other form. 


Statistical analyses 

Even though it is doubtful that the BTBC is unidimensional this assumption was tested. 
Item response data were calibrated and tested for goodness of fit to the Rasch latent trait 
model. The obtained total raw scores (Form A) were scaled by age and converted to T scores 
normalised on percentiles. Internal consistency reliability was estimated by obtaining KR,, 
coefficients for each of the nine age levels (3:6-7:6) and the mean estimate by use of Fisher’s z 
transformation. Alternate form equivalence was tested by product moment correlation. 
Differences in scores between ‘beginning-of-year’ and ‘mid-year’ testing and between boys’ 
and girls’ scores were tested by one-way ANOVA. 


RESULTS 
Table 1 shows the distribution of mean raw scores across the nine age levels. 


TABLE 1 
AGE DISTRIBUTION AND MEAN RAW SCORES 








Age: 3:6 4:0 4:6 5:0 5:6 6:0 6:6 7:0 7:6 
N: 115 164 193 246 228 252 232 267 231 
Mean: 15:91 23.11 27-03 31:29 34:34 38-33 41:28 43-17 44:18 
8р: 6-81 6:92 7:93 7-65 7:01 5:62 5-50 4°36 4:24 


The Rasch calibration and test of goodness of fit for 50 BTBC items produced а fit for the 
whole test over all subjects and items of chi square 71679-75; df 70368; Р = 0:347; 17 items 
have fit P < 0-2 whereas about 10 would be expected by chance; 14 items have fit P < 0.05 as 
opposed to about two or three expected by chance. Therefore the Rasch latent trait model was 
rejected as a suitable method of analysis. Some very small numbers of subjects produced 
anomalies in the T scores at the lower end of the scoring range, so smoother distributions were 
fitted and a table produced for converting any raw BTBC (Form A) score into a standardised 
Score. 


KR,, coefficients at each half-year age level from 3:6 to 7:6 were 0-82, 0-80, 0:85, 0°85, 
0-83, 0-48, 0-80, 0:74, and 0-75 with a mean estimate of 0-81. The lower estimates at 7:0 and 
7:6 reflected the ceiling effect observed in the raw scores at this age level. The mean correlation 
coefficient of 0-80 observed between the Form A and Form B raw scores and the median 
coefficient of 0-82 compare favourably with the AB coefficients observed by Boehm (median 
0:76) confirming her conclusion that the two forms yield essentially equivalent scores (Boehm, 
1971). However, even though the total scores on each form are comparable, certain items are 
not of equal difficulty on both forms. For example, only 31 per cent of subjects made identical 
scores on each form for item 24 (Form A. . . the bottle that is almost empty; Form В... 
the basket that is almost full) with facility values of 0-93 for Form A and 0-21 for Form B. 
For the remaining 49 items, proportions of identical scores ranged from 0-86 to 0-96. 


The null hypothesis of no significant difference between boys’ and girls’ mean scaled 
BTBC scores was accepted (F — 1:68; df 1,927). For boys the mean was 49-71, SD 10-15, 
N = 965; for girls the mean was 50-29, SD 9-65, N = 963. 


There was no significant difference between the mean scaled scores of children tested 
during the first half of their school year and those tested during the second half (mean scaled 
scores were 49-89 and 50-10, SD 9-87 and 9-94, N = 945 and 983, F = 0-22, df 1927). 
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When the scores within SES groups were compared, the results suggested that children of 
the same social class tended to make higher scores on the BTBC if they went to school in a 
rural locality that was not notably marked by social disadvantage than did those who attended 
a school in any other type of area (F = 7:08, df 1311, P < 0-01). Children who lived and 
went to school in urban rehousing estates tended to perform less well than their counterparts 
in the inner city (t = 2:36, df 350, P < 0-02) or in other urban areas (t = :3:66, df 623, 
Р < 0-001) or in non-disadvantaged rural areas (t = 4:88, df 439, P < 0-001). The mean 
standard score (47-59) of the inner city children in the sample population in SES groups 2 and : 
3 was significantly higher (P < 0-02) than that of children in the same SES groups (45- os who ' 
lived on urban rehousing estates away from the centre of a conurbation. 


DISCUSSION 


The standardisation sample appeared to be representative of children in England in terms 
of urban/rural and social class distributions. The absence of data on under-fives who do not 
attend nursery or school might imply that the standard scores for 3- and 4-year-olds should be 
treated cautiously. However, as homogeneity of age distribution was confirmed across SES 
groups, it can be assumed that the under-5s in the sample represent the distribution of social 
class in the total sample population. It would appear then that this English standardisation is 
considerably more representative of the nation's children than is Boehm's sample which is 
acknowledged not to be representative of the USA (Boehm, 1971) and is certainly not repre- 
sentative of Great Britain. 


It is claimed that the individual testing procedures adopted were satisfactory as they 
allowed each child to respond at his or her own pace, eliminated copying and were non- 
threatening. Further, the results were not confounded by children who might have had 
difficulty in placing crosses as required by group testing. 


The ceiling effect apparent at ages 7:0 and 7:6 was also observable in Boehm's (1971) 
sample. It suggests that with 7-year-olds the BTBC is only useful for identifying concepts with 
which some children may not be familiar, or children who are behind their peers in their 
knowledge of the 50 BTBC concepts. 


The finding that there were no overall sex differences in the BTBC scores is in accord with 
current opinion and a number of studies (Macauley, 1978; Nelson and Bonvillian, 1978; Curtis 
and Blatchford, 1981). 


As no significant differences were found between the scaled scores of children tested 
during the first half of their school year and those tested during their second half separate 
norms were not computed for the English standardisation. The children Boehm used to 
establish *beginning-of year' norms were not drawn from the same population as those used 
for the *mid-year' norming; the SES composition of each group was different and they were 
drawn from different cities. Moreover, differences between British primary methods and 
organisation and those predominating in the American grade system may also have 
contributed to these differences between the UK and US samples. 


Even though certain BTBC items are not of equal difficulty on Forms A and B, the total 
scores on each form are comparable. Therefore the standard scores established for Form A 
may be used for Form B. 


To produce three sets of norms for high, middle and low SES groups, as presented in the 
BTBC manual, is considered undesirable. The dangers of labelling schools, groups or indi- 
viduals so that expectations of achievement of those at the bottom of the ladder are lowered 
have been well argued (e.g., Pidgeon, 1970, Pilling and Pringle, 1978). In any event, it appears 
that a child's social class is less indicative of school success than are parental attitudes and 
aspirations and styles of parenting (Light, 1983). Moreover, despite highly significant 
differences between the mean scaled scores of the three broad SES groups, there was 
considerable overlap of scores across the groups. The finding of differences between areas of 
domicile within social class has social implications but it may be that these results are a-typical. 
The demographic differences observed were not anticipated. Personal observation during the 
research prompted fresh hypotheses and further analysis of the data. Research designed to 
examine these differences is required before more generalisable conclusions can be drawn. 
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Meanwhile, this English standardisation and modification of the BTBC is potentially 
more useful than the American (1971) edition for use in the UK. 
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A COMPARISON OF THE PREDICTIVE VALIDITY OF THE REYNELL 
DEVELOPMENTAL LANGUAGE SCALES, THE PEABODY PICTURE 
VOCABULARY TEST AND THE STANFORD-BINET INTELLIGENCE 

SCALE 


By PHIL A. SILVA 


(The Dunedin Multidisciplinary Health and Development Research Unit, University of 
Otago Medical School, New Zealand) 


Summary. Studies of the predictive validity of psychological tests are relatively rare in comparison 
with studies of other types of test validity. This is not surprising as studies of most types of test 
validity (e.g., concurrent, construct) require the collection of data at only one point in time while 
studies of predictive validity can only be done within the context of longitudinal research. There 
have, however, been relatively few such studies and only a few of these have studied the predictive 
validity of tests. The paucity of longitudinal studies is, no doubt, a result of the many problems 
involved in the establishment and maintenance of such long term research (Mednick, 1981). This 
paper reports data on the predictive validity of three tests, the Reynell Developmental Language 
Scales, the Peabody Picture Vocabulary Test, and the Stanford-Binet Intelligence Scale. 


INTRODUCTION 


The Reynell Developmental Language Scales (RDLS) were first published as an 
experimental edition in 1969 (Reynell, 1969) and were revised in 1977 (Reynell, 1977). The 
revision involved only minor changes and the addition of some items to gain better discrimina- 
tion at certain points of the scale and to extend the scale upwards. The only known published 
study of predictive validity is described in the manual supplement (Silva et al., 1978). 
Correlations between the Verbal Comprehension and the Verbal Expression Scales adminis- 
tered when the children in that study were 4 years of age and the Wechsler Intelligence Scale 
for Children (Full Scale) when they were 7 were 0:71 and 0-63 respectively. The correlations 
with Burt Word Reading test scores at age 7 were 0-47 and 0-49 respectively. These 
correlations were similar in magnitude to correlations obtained between Stanford-Binet 
Intelligence Scale scores at age 4 and the intelligence and reading tests scores at age 7. It was 
concluded that the RDLS was as efficient a predictor as the Stanford-Binet Intelligence Scale. 
While the data reported by Silva ef al. (1978) suggested that the RDLS had a moderate degree 
of predictive validity, it should be noted that the test was not intended to predict later 
performance. Rather, it is a clinical instrument for assessing current language development. 


The Peabody Picture Vocabulary Test (PPVT; Dunn, 1965) was designed to provide an 
estimate of a subject's verbal intelligence. The 1965 manual stated that a study was needed to 
**provide predictive validity co-efficients of the PPVT with school achievement of children 
across the full spectrum of intellect at each age level" (p. 47). While many studies of 
concurrent and other types of validity have appeared, to the author's knowledge there have 
been no published studies of predictive validity. 


The current Stanford-Binet Intelligence Scale (SBIS; Terman and Merrill, 1960) is the 
third revision of the test. The SBIS is the longest established and probably the most researched 
intelligence test available. A longitudinal study of its predictive validity was reported some 
years ago by Bradway et al. (1958). Children were first tested at ages 2 and 5:2. These initial 
IQs correlated at 0-65 with retests 10 years later and 0-59 with 25-year retests. 


This paper reports a comparison of the predictive validity of the RDLS and the PPVT 
administered at age 3 and the RDLS and the SBIS administered at age 5. The criteria for the 
validity study included intelligence and reading tests administered at ages 7, 9, and 11. 


METHOD 
Sample 


The sample consisted of those children being followed longitudinally by the Dunedin 
(New Zealand) Multidisciplinary Health and Development Research Unit. They have been 
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assessed at ages 3, 5, 7, 9, and 11. The study and sample are described by McGee and Silva 
(1982). The sample was advantaged in terms of socio-economic status in comparison with the 
rest of New Zealand and under-representative of the Maori or other Polynesian races. 


The sample eligible for the study comprised all those born at Dunedin’s one obstetric 
hospital during the 12-month period between 1 April, 1972, and 31 March, 1973, who were 
known to be resident in Otago when the children were 3 years of age (N = 1,139). Of these, 
1,037 were followed up as 3-year-olds (91 per cent). Those not followed up were either refusals 
(N = 68) or located too late for inclusion (N = 34). The children were followed up and 
assessed as 5-year-olds (N = 991), 7-year-olds (N = 954), 9-year-olds (N = 955), and 
11-year-olds (N = 925). Of the 112 children who were not followed up at age 11, 70 were not 
assessed because of parental refusal to co-operate, 38 were not assessed because they lived in 
areas where it was not possible to arrange assessment (e.g., other parts of New Zealand or 
overseas), and four children had died. In a study to determine whether there were any 
significant differences between children who were or were not assessed at age 11 on measures 
taken in earlier phases of the study, McGee (1985) found that there were no significant 
differences for measures of language, intelligence, and reading. This suggests that those who 
dropped out did not bias the remaining sample. 


Measures 

The experimental edition of the RDLS (Reynell, 1969) was administered when the 
children were aged 3 and 5. The Peabody Picture Vocabulary test (PPVT; Dunn, 1965) was 
administered when the children were 3 and the Stanford-Binet Intelligence Scale (SBIS; 
Terman and Merrill, 1960) when the children were 5. 


At ages 7, 9, and 11, intelligence was assessed with the Wechsler Intelligence Scale for 
Children — Revised (WISC-R; Wechsler, 1974). One subtest from each scale was omitted to 
save testing time (Comprehension and Picture Arrangement). Results were prorated according 
to the test manual recommendations. At the same ages, reading was assessed by the Burt 
Word Reading Test (Scottish Council for Research in Education, 1976). Previous research 
(Silva, 1981) had shown that the results from this test correlate highly with those from a prose 
reading test (т = 0:86) and Gilmore et al. (1981) found that results correlated moderately 
highly with tests of reading comprehension (mean r = 0-69) and reading vocabularly (mean 
r = 0:75). 


At ages 3, 5, and 7, most of the children were assessed within one month of their 
birthdays and at ages 9 and 11 within approximately two months of their birthdays. Those 
assessed outside these age ranges had their scores age-adjusted where appropriate. All testing 
was carried out by trained psychometrists who were not aware of previous results. 


RESULTS 


Table 1 sets out the means, standard deviations, and Pearson product moment correla- 
tions among the results obtained at ages 3 and 5 from the RDLS, the PPVT, and the SBIS. 
The distribution of results from the 5-year RDLS were negatively skewed; all other distri- 
butions were normal in form. 


TABLE 1 


MEANS, STANDARD DEVIATIONS, AND CORRELATIONS AMONG THE LANGUAGE AND INTELLIGENCE TESTS 
TAKEN WHEN THE CHILDREN WERE AGED 3 AND 5 YEARS 





Age 
Test Tested N Mean SD 1 2 3 4 5 





. RDLS Comprehension 3 1,028 34:8 8-76 
RDLS Expression З 1,028 35:8 8-41 0-63 
. Peabody PVT 3 979 23:5 9-57 0-65 0-55 
. RDLS Comprehension 5 936 50-9 5-26 0°52 0-46 0-46 
RDLS Expression 5 936 50:2 6:66 0-40 0-43 0-30 0-52 
. Stanford-Binet IQ 5 986 105-9 16-67 0-59 0-56 0-60 0-66 0-50 
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The correlations ranged from 0-30 to 0-66. The tests, taken at age 3 and those taken 
at age 5 yielded moderate intercorrelations (0:52 to 0-66). The correlations between the 
3-year and 5-year measures were also moderate, ranging from 0:30 to 0-60. The test-retest 
correlations for the RDLS from age 3 to 5 were higher for the Comprehension scale (0- 52) 
than the Expression scale (0-43). 


Table 2 sets out the means, and standard deviations of the 7-, 9-, and 11-уеаг results. All 
distributions were normal in form. Also shown are the Pearson product moment correlations 
between the 3- and 5-year test results and the criterion test results. 


TABLE 2 


MEANS, STANDARD DEVIATIONS, AND CORRELATIONS BETWEEN THE LANGUAGE AND INTELLIGENCE TESTS 
TAKEN AT AGES 3 AND 5 AND THE INTELLIGENCE AND READING TESTS TAKEN AT AGES 7, 9, AND 11 








RDLS RDLS RDLS RDLS 
Compre- Expres- PPVT Compre- Expres- SBIS 
Age hension sion at at hension sion at at 
Test Tested N Mean SD atage3 atage3 age3 atage5 age5 ages 
Verbal IQ 7 950 105-7 15-13 0:53 0-49 0-49 0-52 0-34 0-67 
Performance IQ 7 951 106-8 14-4] 0-41 0-34 0:35 0-38 0-24 0:47 
Full Scale IQ 7 950 106-8 14-51 0:54 0-48 0-49 0-51 0-33 0-65 
Reading 7 942 29:4 13:34 0:39 0-41 0-38 0 0-27 0-53 
Verbal IQ 9 953 103-1 16:09 0:51 0:47 0-49 0:54 0:38 0:68 
Performance IQ 9 953 105-2 15:20 0:44 0:37 0-37 0-43 0:29 0:53 
Full Scale IQ 9 953 104-4 15-72 0:53 0-47 0-49 0-54 0-37 0:68 
Reading 9 952 53:9 19:01 0-37 0-39 0-37 0-39 0-30 0-53 
Verbal IQ 1i 919 104-4 15-72 0-46 0-41 0-47 0-48 0-32 0-63 
Performance IQ 11 917 110-9 15-89 0-39 0-32 0-33 0:33 0-28 0-48 
Full Scale IQ 11 917 108:2 15:76 0-47 0:41 0-45 0-46 0-33 0:62 
Reading 11 919 72:4 20-25 0-37 0:39 0-37 0:38 0:30 0-53 





The correlations ranged from 0-24 to 0-68. There was little difference between the mean 
predictive validity correlation co-efficients obtained from the 3-year measures and the 5-year 
RDLS Verbal Comprehension Scale (0-41 to 0-45). The weakest predictor was the 5-year 
RDLS Expressive Language scale and the best predictor was the SBIS. There was very little 
reduction in the magnitude of the correlation coefficients over the four-year follow up period. 
The criterion measures that were best predicted by the preschool tests were the WISC Verbal 
and Full Scale IQs. The correlation co-efficients with the WISC Performance IQs and reading 
test scores were similar. 


DISCUSSION 


The intercorrelations among the preschool tests were of moderate size, suggesting that 
there was some degree of overlap in the abilities tapped by the tests. Examination of the test 
content would confirm the similarity. The RDLS items involved language comprehension and 
expression. Because of the similarity in the preschool tests, it would be expected that the tests 
would be similar in terms of predictive ability. This expectation was also based on previous 
research (Silva ef al., 1978). The results from the present study showed that four of the tests 
resulted in similar average predictive validity correlation co-efficients. These were both scales 
of the RDLS and the PPVT administered at age 3 and the RDLS Verbal Comprehension Scale 
administered at age 5. The correlations ranged from 0:41 to 0-45. The correlations obtained 
from the RDLS were considerably lower than those reported in the earlier study by Silva ef al. 
(1978). 


The RDLS Expressive Language Scale administered at age 5 had the lowest correlation 
co-efficients with the criterion tests. These low correlations may have been due to a ceiling 
effect in this scale shown by the negative distribution of scores (Cohen and Cohen, 1975). This 
was apparently corrected in the revised edition (Reynell, 1977). 
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The best predictor in this study was the SBIS. The correlation between the 5-year SBIS 
score and the 11-year Full Scale WISC was 0-62 which was very close to the correlation of 
0-65 reported by Bradway et al. (1958) between Stanford-Binet IQ assessed at the preschool 
level and IQs assessed with the same test 10 years later. 


It was of interest to note that there was very little variation between the correlation co- 
efficients obtained between the preschool tests and the criterion tests administered at each 
succeeding age. 


Finally, as would be expected from the verbal nature of the preschool tests, the highest 
predictive validity correlation co-efficients were obtained with the WISC Verbal and Full Scale 
results. The correlations with the Performance IQs and the reading scores were lower and at a 
similar level of magnitude. 


It is concluded that the RDLS and the PPVT have low to moderate predictive validity 
when correlated with intelligence and reading tests taken between four and eight years later. 
The SBIS was the best predictor included in this study and was found to have moderete 
predictive validity at the ages studied. It will be of interest to study the longer term predictive 
validity of the tests in future phases of this ongoing longitudinal study. 
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MATHEMATICAL LOW ATTAINERS CHECKLIST 


By DEREK W. HAYLOCK 
(School of Education, University of East Anglia, Norwich) 


SUMMARY. To assess the frequency of 24 statements often associated with low attamers іл 
mathematics, a checklist was devised and completed for each of 215 mathematical low attainers 
aged 9-10 years. Differences between the frequencies of association of the statements with 
mathematical low attaining boys and girls were tested for significance. 


INTRODUCTION 

It is generally recognised and clearly perceived by any teacher involved with remedial 
groups in mathematics that low attainment in this subject is associated with a complex 
interaction of many different factors. The Cockcroft Report (1982, paragraph 334) posits that 
low attainment in mathematics can occur in children whose general ability 1s now low and that 
factors such as inappropriate teaching, lack of confidence, lack of continuity and poor 
reading skills can be contributory. Lumb (1978) likewise identifies items associated with the 
“slow learning syndrome in mathematics’’ which include intellectual factors, such as low IQ 
and limited memory span, emotional and social factors, such as immaturity, instability and 
poor attendance record, and, in particular, the lack of linguistic skill. The Schools Council 
Working Paper on low attainers in mathematics (Denvir et al., 1982) also pinpoints the lack of 
homogeneity in any group of such pupils. Again identifying factors which are frequently 
associated with failure in school mathematics — they cite physical, physiological and sensory 
defects, tiredness, general ill health, emotional and behavioural problems, negative attitudes, 
anxiety, lack of self-confidence, inappropriate teaching, lack of continuity, gaps in education, 
general slowness in grasping ideas, impoverished home background, difficulties in oral and 
written expression, poor reading ability, and immaturity — they conclude that their low 
attainment in mathematics might well be the only common characteristic of a group of such 
pupils. 


Such a complex pattern of possible factors suggests the obvious need for children having 
difficulties in mathematics, as in any aspect of the school curriculum, to be considered as 
individuals. But in a non-ideal world such children will often be taught in groups. So the first 
purpose of the present research was simply to identify the most common factors associated 
with low attainment in mathematics, based on a checklist derived from the delineation of 
possible factors suggested in the Schools Council working paper. A similar small scale study of 
20 underachievers in mathematics, aged 10-11, undertaken by Ross (1964) in the USA found 
the most frequent characteristics to be that the pupils were at least one year below their grade 
level in reading, that they were underachieving in other school subjects, that they evinced 
withdrawal and defeatist attitudes, and that they showed immaturity and emotional problems. 


In view of the well-rehearsed differences between boys and girls ın achievement in 
mathematics and the apparently related differences in attitudes to the subject (see, for 
example, the Cockcroft Report, Appendix 2, and the APU second primary survey, 1981), the 
second purpose of the present research was to highlight any significant sex-related differences 
in the occurrence of factors associated with low attainment in mathematics. 


METHOD 


A checklist consisting of 24 statements derived from the description of low attainers in 
mathematics provided by the Schools Council working paper was prepared. Three middle 
school teachers confirmed that they found the wording of these statements clear and that they 
would be able to complete the questionnaire for children known to them to be low attaining in 
mathematics. The statements were presented in the checklist in the order shown in Table 2. 


Copies of the checklist were sent to 49 middle schools (8-12 years) in the city of Norwich 
and mainly suburban areas of the county of Norfolk. Of these, 29 responded to the invitation 
to participate in the research. The schools had recently administered the NFER Basic Mathe- 
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matics Test C (1970) to the pupils in their second years as part of a county-wide mathematics 
screening policy. These pupils were thus in their sixth term in middle school and aged 9-10 
years. The test is norm-referenced and is designed to assess the understanding of fundamental 
mathematical relationships and processes, attempting to eliminate as far as possible the effects 
of different teaching methods. The request to the school was that for each child who obtained 
a standardised score of 85 or less on this test the checklist be completed by a teacher to whom 
the child in question was well known. The teacher was required to indicate the extent to which 
it was judged that the statement described the child in question by ringing one option on a 
five-point scale: D (definitely), P (probably), U (unsure), PN (probably not), and DN 
(Definitely not). Completed checklists were returned for a total of 215 pupils, consisting of 
126 boys and 89 girls. - 


RESULTS 


Table 1 shows the percentages of children for whom teachers judged the extent to which 
the statement described them to be at each of the D, P, U, PN and DN levels. The small 
percentages of “unsure” responses, ranging from 2 per cent to 14 per cent with a mean of 
about 7 per cent is an indication that in most cases the teachers felt able to make a judgment 
one way or the other. The statements in Table 1 have been ranked according to the percentage 
of pupils for whom the ''definitely" (D) response was given, with the percentage of 
**probably"' (P) responses being used to rank those with equal D scores. This table indicates 
therefore the frequency of occurrence in this sample of 9-10 year-olds of the various factors 
associated with low attainment in mathematics according to the judgments of their teachers, 
with the most frequent factors at the top of the ranking and the least common factors at the 
bottom. 


TABLE 1 
RANK ORDER OF 24 STATEMENTS IN MATHEMATICAL LOW ATTAINERS CHECKLIST 


D Р U PN DN 





The child has been considered low attaining in mathematics from the first 


year in this school. 60 22 8 6 5 
The child is low attaining in most areas of the curriculum. 59 20 3 7 0g 
The child's reading skills are poorly developed. 53 24 3 10 11 
The child is equally poor in all aspects of mathematics 49 25 4 13 9 
The child's language skills are poorly developed. 47 23 4 10 15 
The child shows perceptual difficulties, such as reversals of figures, poor 

spatial discrimination. 26 19 3 25 27 
The child has immature motor skills. 26 18 9 23 24 
The child ıs immature in relationships with other pupils. 22 17 6 21 34 
The child has better understanding of number than spatial concepts. 19 36 14 19 13 
The child shows little commitment and interest in mathematics lessons. 19 20 5 31 26 
The child is nearly always preoccupied, appearing to find school and learning 

irrelevant. l6 14 5 27 38 
The child shows little commitment and interest in school ın general. 15 18 7 24 36 
The child experiences social difficulties with the peer group 15 14 7 20 43 
The child has difficulty in relating to adults. 12 21 7 23 37 
The child displays behaviour problems, such as hyperactivity, in most 

lessons. 12 14 2 13 59 
The child shows an abnormal level of anxiety towards most tasks in school. 11 15 11 30 33 
The child responds sensibly in a one to one conversation with a teacher but 

behaves badly in front of other children. 1 13 6 17 53 
The child has been absent frequently in the last year. 1] 6 3 12 68 
The child has emotional problems related to an exceptional home 

background. 10 20 12 23 34 
The child is better at mental mathematics than written 10 16 12 24 39 
The child shows an abnormal level of anxiety towards mathematics. 9 17 8 33 33 
Some physical factor such as deafness, poor eye sight, colour blindness 

contributes to this child's low attainment in mathematics. 6 12 7 30 47 
The child seems excessively tired much of the time. 4 0 5 18 63 
The child has suffered frequent changes of mathematics teachers ог schools. 1 4 10 4 8l 





[Figures are percentages of children (N = 215) for whom teachers judged the extent to which the 
statement described them to be ‘‘definitely’’ (D), ‘‘probably’’ (P), ‘‘unsure’’ (U), “probably not" 
(PN), “definitely not" (DN).] 
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‘Table 2 provides a comparison between the mathematically low attaining boys and girls in 
this sample. The mean scores given in this table for each statement were obtained by scoring 
100, 75, 50, 25 and 0 for each of the responses D, P, U, PN, DN respectively. Hence the 
magnitude of the mean score is an indication of the degree of definiteness to which the 
statement is judged to describe the group of children in the sample. Values of t have been 
calculated to assess the significance of any differences in mean scores between the boys and 
the girls. It can be seen from Table 2 that the boys score significantly higher than the girls in 
statements $ 8, 10, 14 and 17, and that the girls score significantly higher than the boys in 
statement 7. 


TABLE 2 
COMPARISON BETWEEN Boys AND GIRLS ON MATHEMATICAL LOW ATTAINERS CHECKLIST 





Means 
Boys Giris All t 
. Some physical factor such as deafness, poor eye sight, colour 


p 


blindness contributes to this child's low attamment in maths. 25 26 25 0-23 
2. The child has immature motor skills. 54 44 50 1-85 
3. The child shows perceptual difficulties, such as reversals of figures or 
poor spatial discrimination. 49 46 48 0:54 
4. The child shows little commitment and interest in school in general. 44 30 38 2-73** 
5. The child shows little commitment and interest in maths lessons. 47 40 44 1:37 
6. The child shows an abnormal level of anxiety towards most tasks in 
school 33 38 35 1-06 
7. The child shows an abnormal level of anxiety towards maths. 30 40 34 2:19* 
8. The child displays behaviour problems, such as hyperactivity, in most 
lessons. 35 15 27 3:90** 
9, The child has emotional problems related to an exceptional home 
background. 40 33 37 1-44 
10. The child is nearly always preoccupied, appearing to find school and 
learning irrelevant. 41 28 36 2-54* 
11. The child experiences social difficulties with the peer group. 36 32 35 0-76 
12. The child is immature in relationships with other pupils. 46 38 43 1-44 
13. The child has difficulty in relating to adults. 40 32 37 1-60 
14, The child responds sensibly in a one to one conversation with a 
teacher but behaves badly in front of other children. 35 18 28 3:41** 
15. The child has been considered low attaining in maths from the first 
year in this school. 81 83 83 0:32 
16. The child's language skills are poorly developed. 71 67 69 0-78 
17, The child's reading skills are poorly developed. 79 68 74 2-27" 
18. The child is low attaining in most areas of the curriculum. 79 75 7] 0-85 
19. The child is equally poor їп all aspects of maths. 71 75 73 0-85 
20. The child is better at mental maths than written. 37 29 34 1:65 
21. The child has better understanding of number than spatial concepts. 58 56 57 0-44 
22. The child has been absent frequently in the last year. 18 22 20 0-85 
23. The child has suffered frequent changes of maths teachers or schools. 12 7 10 1-57 
24. The child seems excessively tired much of the time. 18 18 18 0-00 





** difference between boy/girl means significant at 1 per cent level. 
* difference between boy/girl means significant at 5 per cent level 


[Means obtained by scoring 100, 75, 50, 25, 0 for responses D, P, U, PN, and DN respectively. ] 


DISCUSSION 


The picture of mathematical low attaining 10-year-olds which emerges from Table 1 
shows some interesting features. The first five statements in this table are considerably more 
frequently judged to be true about low attainers in mathematics than the rest. It is fair to 
conclude therefore that it is very commonly the case that the low attainer in mathematics at 
this age will have been considered a low attainer for some time, will be low attaining in most 
areas of the curriculum and in all aspects of mathematics, and will have poor reading and 
language skills. The frequency with which language difficulties are associated with low 
attainment in mathematics should be noted especially. This supports the argument that 
mathematical activities designed specifically for language development have a crucial part to 
play in the development of low attaining children’s understanding of mathematics (Choat, 
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1974; Jones and Haylock, 1985). The frequency with which reading difficulties are found to be 
associated with low attainment in mathematics is not surprising given Rothery’s (1980) 
findings concerning the mismatch between the readability of typical primary school mathe- 
matics texts and the average reading levels of pupils for whom they are intended, and also 
Clements’ (1980) finding that 24 per cent of the errors made in mathematics by low achieving 
12-year-olds are simply reading or comprehension errors. Table 2 indicates that poor reading 
skill is significantly more of a problem for the boys in the sample than for the girls. 


Social immaturity and behaviour problems tend to dominate the middle of the ranking in 
Table 1. But Table 2 indicates that these are significantly more associated with boys than girls. 
More often for low attaining boys than for girls is it the case that the child shows little 
commitment and interest in school in general, displays behaviour problems, appears 
preoccupied, finding school and learning irrelevant, and behaves particularly badly in front of 
other children (statements 4, 8, 10, 14). On the other hand the low attaining girls show signifi- 
cantly higher levels of anxiety towards mathematics (statement 7). This latter finding supports 
the findings of several studies concerning sex differences in attitude to and self-confidence in 
mathematics (Schildkamp-Kundiger, 1980). Dornbusch (1974), for example, reports that most 
students give lack of ability as the main reason for failure in most subjects of the curriculum, 
whereas in mathematics girls give this reason much more frequently than boys. It is relevant 
therefore to note that the difference between the low attaining boys and girls in the present 
research in terms of anxiety towards mathematics (statement 7) was more marked than that 
towards most tasks in school (statement 6). 


Requests for reprints should be sent to Dr. D. W. Haylock, School of Education, University of East 
Anglia, Norwich, NR4 7TJ, England. 
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THE REDUCTION OF DISRUPTIVE BEHAVIOUR IN TWO SECONDARY 
SCHOOL CLASSES 


By E. MCNAMARA, M. EVANS Амр W. HILL 
(Lancashire Schools’ Psychological Service) 


SUMMARY. The constituent parts of a five component behavioural intervention package are 
described and the effect of the intervention on the on-task behaviour of two **disruptive"" secondary 
school classes reported. It is claimed that levels of pupil on-task behaviour were significantly 
increased in both classes. Concomitant changes in teacher behaviour were also reported and it is 
suggested that increased levels of pupil on-task behaviour may have elicited higher levels of positive 
verbal behaviour from the teachers. 


INTRODUCTION 

Merrett (1981), in a review of studies of behaviour modification in British educational 
settings, concluded that relatively little behavioural work had been carried out in British 
schools in general and secondary schools in particular. The emphasis of behavioural research 
has been in the primary school sector, e.g., Tsoi and Yule (1976), Merrett and Wheldall 
(1978), and only very occasionally have reports involving secondary school pupil groups 
appeared, e.g., Blundell and Merrett (1982), McNamara (1984). Yet the secondary school 
sector has been identified as an area of particular concern in relation to disruptive pupil 
behaviour (Upton and Gobell, 1980). So limited has been the research involving the use of 
behavioural methods in the classrooms of disruptive pupils in secondary mainstream settings 
that the generalisability of the positive outcome research referred to earlier has yet to be 
demonstrated. The aim of the research reported in this paper was to investigate the effect of 
behavioural interventions on the levels of disruptive behaviour in two secondary school 
classes. A positive outcome would help build up a body of empirical support for the 
generalisability of such findings. 


METHOD 
The studies were carried out in a comprehensive school catering for pupils aged 11 years 
to 16 years. The catchment area served was a mixed middle class and working class 
community. The school set the pupils by ability. There were eight sets in each year group. The 
lowest set comprised pupils with learning difficulties; some of the pupils also engaged in 
ойын behaviour. The bottom set of each year group was known as the ''remedial" 
class. 


Experimental Class 1 

This was the second year remedial class and mathematics was the subject taught during 
the period of the research. The topics covered by the use of work cards and text books 
included fractions (equivalent fractions, cancelling mixed numbers and improper fractions), 
shapes and rectangles and percentages. There were 17 pupils in the class aged 12 to 13 years — 
9 boys and 8 girls. It was observed that they came into the classroom noisily, jostled each other 
for places, sitting where and with whom they wished. Sometimes tables were pushed together 
without the teacher's permission being sought. Many pupils arrived at the lesson without 
equipment (pencils and rulers). A prompt start to the lesson proved impossible as these pupils 
pestered the teacher for equipment. During the lesson the pupils got up out of their places 
without hesitation and chatted to each other. 


Experimental Class 2 

This was similar to experimental class 1 but was a third year group — remedial mathe- 
matics was the subject taught. The topics covered by the use of work cards and text books 
included length, mass, capacity, scale drawings, area, and work involving calculation. There 
were 15 pupils in the class aged 13 to 14 years — 10 boys and 5 girls. The pupils were noisy and 
disruptive at the start of each lesson. They sat where they wished, shouted out and walked 
about the classroom. 
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The teacher 

The teacher was a 23-year-old graduate with a post-graduate certificate in remedial 
education. She had one year’s experience teaching in the school: in that year she had taught 
experimental class 1 for English and the experimental class 2 for mathematics. She reported 
that she hed always experienced some difficulties in controlling both classes and readily agreed 
to participate in the research project. She felt that she got on well individually with most of the 
pupils in both classes, but was concerned that she was frequently having to send for senior 
members of staff because confrontations with pupils often escalated into serious incidents. 


Observers 

Two experienced teachers agreed to act as observers for the study. The training consisted 
of three one-hour sessions and covered the issues of collecting data in naturalistic setting, 
defining behaviour objectively and the concept of observer reliability. The observation 
schedule was pre-tested using video recordings of classroom interactions followed by field 
testing in the classroom situation during four ‘“‘habituation’’ lessons. Possible observer bias 
was countered by drawing the observers' attention to the diametrically opposed outcomes of 
the studies of McAllister et al. (1969) and McCullough (1972). Observer drift was counteracted 
by initiating a further observer training session after the collection of baseline data for class 1. 


Design of study 

A multiple baseline across groups design was used. Baseline data for experimental class | 
was collected (four lessons). The intervention package (to be described later) was then 
implemented for nine lessons; the baseline data for experimental class 2 was collected for a 
further five lessons (baseline totalled nine lessons), after which the intervention package was 
introduced into experimental class 2 for three lessons. 


Data collection 

Data were collected for pupil on-task/off-task behaviour and for teacher positive and 
negative verbal behaviour. On-task behaviour was defined as attending to the teacher, set 
assignment or other pupils as appropriate. Off-task behaviour consisted of all pupil behaviour 
other than that included in the definition of on-task. 


All teacher positive and negative verbal behaviour was recorded verbatim. This consisted 
of positive or negative comments following appropriate or inappropriate pupil behaviour. The 
type of pupil behaviour, either academic, e.g., correct academic response, or social, e.g., 
sitting quietly, was also recorded. Positive teacher verbal behaviour was defined as comments 
such as ‘‘good’’, ‘‘correct’’, ‘‘that’s right"; negative verbal behaviour was defined as 
comments such as ‘‘shsh’’, ‘Бе quiet", *'stop that’’, ‘‘get back in your seat”. 


Ten-second interval recording schedules were used. This schedule allowed for pupil 
behaviour and teacher behaviour to be recorded on the same recording sheet according to the 
following schedule. The behaviour of all pupils in the class was observed sequentially 
according to a predetermined order. A momentary time sampling procedure was used with 
each pupil being observed sequentially on every 10th second. Observation and recording were 
carried out for two minutes (12 observations) followed by a one minute rest period during 
which the observers made notes about classroom events. The accumulation of these notes 
provided the basis for a descriptive analysis to complement the empirically based experimental 
analysis. The timing was achieved using a shared stop-watch. Observations started when the 
teacher indicated that the lesson had begun and were concluded at the end of the last full two 
Rune eee VeRO period prior to the teacher indicating that the lesson was being 
concluded. 


Teacher positive and negative verbal behaviour was recorded at the same time as pupil 
behaviour was being sampled. The teacher's positive and negative comments were taken down 
verbatim. As the pupil behaviour was being sampled at every 10th second the observers were 
free to focus their attention on the teacher for the remainder of the observation period. 


Intervention 

(i) Classroom seating arrangements were altered so that tables were set in rows and the 
pupils sat two to a table. Previously the tables were scattered about the room and were 
occasionally pushed together by the pupils. 
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(ii) Rules of the classroom were displayed оп a flip-chart in a prominent position at the 
front of the classroom. Copies of the rules were given to each pupil on a printed sheet. The 
rules were read over and commented on by the teacher at the beginning, middle and end of the 
lesson. The rules were: (a) arrive on time; (b) have the correct equipment; (c) no shouting out; 
(d) work quietly; (e) don’t prevent others working. 


(iii) The teacher was asked to make evaluative statements after commenting on the rules 
at the end of the lesson. 


(iv) As a reward, if the teacher evaluated the class’s behaviour positively, the last 10 
minutes of the lesson could be spent doing puzzles. These consisted of, for example, messages 
in number codes. The pupils found these puzzles interesting and enjoyable. 


(v) For self-assessment the rules of the classroom were distributed at the beginning of the 
lesson. The pupils were instructed to put a tick against those rules which they had followed 
and a cross against those rules which they had breached. The teacher indicated that she would 
scrutinise their self-assessments. 


RESULTS 


Procedural implementation data 

Records were kept to assess the extent to which the agreed intervention procedures were 
implemented. These were substantially implemented in all experimental phase lessons except 
lessons 7 and 12. This lack of implementation of the intervention is associated with a 
temporary drop in the otherwise higher levels of pupil on-task behaviour as seen across the 
experimental phase. 


Inter-observer agreement 
The reliability of data for pupil behaviour was calculated using the formula: 


No. of agreements 
Inter-observer agreement = No.of agreements + No. of disagreements Х 100 


The reliability of teacher positive verbal behaviour was calculated using the formula: 


Inter-observer agreement = Het Ren. x 100 


The reliability of teacher negative verbal behaviour was calculated using the same 
formula. 


The average level for (8) pupil on-task/off-task behaviour was 93-2 per cent; (b) teacher 
positive verbal behaviour was 75-4 per cent; (c) teacher negative verbal behaviour was 95-5 
per cent. 


Pupil on-task behaviour 
The lesson by lesson average levels of pupil on-task behaviour across baseline and 
intervention phases is presented in Figure 1. 


The introduction of the intervention package produced an immediate reversal in the 
downward trend of on-task behaviour in class 1. This treatment effect was maintained over 
the nine lesson periods of the intervention, covering a time span of four weeks. The average 
level of on-task behaviour during this period was 75 per cent, representing an average increase 
of 26 per cent over baseline levels. This increase in on-task behaviour represented an increase 
from an average level of approximately 22 minutes per 45-minute lesson to one of 
approximately 34 minutes. Inspection of Figure 1 reveals that with one exception (lesson 7) all 
the data points for on-task behaviour exceeded the highest level achieved during baseline. This 
lack of overlap of levels of on-task behaviour during baseline and intervention phases gives 
grounds for claiming reliability of effect of intervention (Kazdin, 1982), a claim supported by 
examination of the effect of intervention for class 2. In this class pupil on-task behaviour 
remained unaltered while that in class 1 rose, but then rose from an average of 54 per cent 
during baseline to an average of 69 per cent during the three lessons in which the intervention 
was in operation. 
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Teacher positive and negative verbal behaviour 

The rates of teacher positive and negative verbal behaviour per 45 minutes are reported in 
Table 1. In class 1 the ratio of the balance of teacher control in favour of negative verbal 
behaviour was 1: 2:8. Consequently, the teacher was at least three times as negative as positive 
towards the children. This finding of reliance by the teacher on negative control strategies 1s 
consistent with the findings of other surveys on natural rates of teacher positive and negative 
behaviour, e.g., White (1975), Thomas ef al. (1978). The findings on teacher behaviour in 
class 1 were reflected in the data for class 2: teacher negative verbal behaviour exceeded 
positive by a ratio of 4-4: 1. 


TABLE 1 
RATE OF TEACHER POSITIVE AND NEGATIVE VERBAL BEHAVIOUR PER 45-MINUTE LESSON 





Baseline 
Intervention 





DISCUSSION AND CONCLUSIONS 


The multiple baseline experimental design employed avoided the necessity of a reversal 
phase to demonstrate experimental control of pupil behaviour. The introduction of the 
experimental variable in class 1 changed pupil behaviour for class 1 but not class 2. The 
subsequent introduction of the experimental variable in class 2 then effected change in pupil 
behaviour in that class. Consequently it was concluded that the intervention package effected 
considerable behaviour changes when introduced. The use of a multi-component intervention 
package mitigates against the identification of the contributory effects of the individual 
intervention components towards the behaviour change effected. However, in the applied 
psychology field, at least ‘initially, effecting change is more important than fully 
understanding the underlying mechanisms which bring about the change. 


Scrutiny of Figure 1 reveals quite marked variations in pupil behaviour during the 
experimental phase for class 1 and the baseline phase for class 2. Some of the variability in 
these phases is accounted for by confounding variables. The identification of such variables is 
helpful in the experimental analysis of behaviour for the effect of the experimental variable 
can then be identified more precisely. For example, the level of on-task behaviour during 
baseline lesson 4 for class 2 was the only datum point that overlapped with the intervention 
phase range of on-task behaviour. Consequently, baseline lesson 4 was an important lesson in 
that on initial visual inspection of the data it appeared to detract from the power of 
demonstration of the effect of intervention, i.e., had there been no overlap the demonstration 
of the effect of intervention would have been highly convincing (Kazdin 1982). However, the 
content of lesson 4 was very different from that of any of the other baseline or experimental 
phase lessons. 


The whole lesson was given over to the use of hand calculators — and this activity 
appeared to have a very high interest level for the pupils concerned. This **explanation"' of the 
high level of pupil on-task behaviour for this baseline lesson therefore reduces the 
confounding effect it would otherwise have for data interpretation. It also emphasises the 
need to supplement the experimental analysis of behaviour with a descriptive analysis. 


Certain effects could be ascribed to variation in implementation of the intervention. For 
the first two post-intervention lessons for class ! all five aspects of the implementation 
package were implemented. This resulted in an immediate increase in levels of pupil on-task 
behaviour from 49 per cent to 65 and 79 per cent respectively. Pupil on-task behaviour then 
dropped to 58 per cent in the third lesson of the intervention phase (lesson 7). A superficial 
interpretation of this might be that the initial effectiveness of the intervention package was 
wearing off. However, inspection of procedural implementation data indicated that only one 
of the five intervention strategies was implemented (self-assessment). Hence, the low level of 
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pupil on-task behaviour could not be attributed to lack of effect of the intervention but rather 
to non-implementation of the package. A similar association is seen in lesson 12. The level of 
pupil on-task behaviour was depressed and in this lesson only one component (rules) was 
implemented. 


Table 1 indicates that significant changes occurred in teacher behaviour between baseline 
and intervention phases. In class 1 the rate of teacher positive behaviour increased by two to 
three times the baseline rate. In addition, the rate of teacher negative behaviour increased but 
to a much lesser extent. Class 2 data mirrored that of class 1 to some extent: teacher positive 
behaviour increased to a level 59 per cent above baseline rate. There was, however, little 
change in the rate of teacher negative behaviour in class 2. 


These changes in teacher behaviour between baseline and intervention phases confound 
the exactitude with which the claim can be made that pupil behaviour was changed by the 
intervention package. An alternative explanation to that which claims that the intervention 
package directly influenced pupil behaviour is one which hypothesises that the intervention 
package influenced teacher behaviour and that this change in teacher behaviour influenced 
pupil behaviour. 


Another interpretation of the data is available. It may have been that increased rates of 
pupil-appropriate behaviour were promoted by the intervention package and that this 
increased rate of pupil behaviour in turn elicited more positive behaviour from the teacher. 
This interpretation, i.e., that pupil behaviour may sometimes determine teacher behaviour, is 
not new. Sherman and Cormier (1974) demonstrated that increases in levels of pupil- 
appropriate behaviour brought about by means independent of the teacher produced an 
increase in the relative level of teacher positive responses to that behaviour — attributable to 
the increased availability of appropriate behaviour to which the teacher could respond. 


A consideration of reciprocity theory (Patterson and Reid, 1971) lends theoretical 
support to Sherman and Cormier's view in that it predicts that teacher behaviour should 
change as pupil behaviour changes; their view too is compatible with that of Tharp апа Wetzel 
(1969) who have long conceptualised social interactions in terms of two-way reinforcement. 


However, contingency management at the interpersonal level was not a planned feature 
of the intervention package used for this study. Indeed the only operant based component of 
the intervention package (a rewarding activity at the end of the lesson) was regularly omitted 
by the teacher who judged it disruptive of the smooth running of the lesson. Examination of 
the remaining four components of the intervention package supports Merrett's (1981) view 
that we need to look more to the setting events and ecological variables (see Wheldall, 1981) 
when considering inappropriate pupil behaviour at the secondary level: for the statement of 
rules and seating arrangements are certainly important antecedents of pupil behaviour. In 
addition, the psychology of feedback, e.g., Van Houten (1984) may also be a source of useful 
material to promote more adaptive pupil behaviours — for feedback has been a significant 
component in the successful interventions reported to date, e.g., McAllister ef al. (1969), 
McNamara (1984), and was a component of the intervention package reported in this study. 
Finally, a further area of applied psychology, that of self-management (see Karoly and 
Kanfer, 1982), may provide an additional theoretical resource base from which practices (such 
as self-recording as used as part of this study) can be generated. 


Requests for reprints should be sent to Mr E. McNamara, Schools’ Psychological Service, Area 3, 
Joint Divisional Offices, East Cliff, Preston PR1 3EU, England A more detailed report and analysis of 
this data can also be obtained on application to Mr. McNamara. 
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WHEN NONSENSE IS BETTER THAN SENSE: NON-LEXICAL ERRORS 
TO WORD READING TESTS 


By G. BRIAN THOMPSON 
(Department of Education, Victoria University of Wellington, New Zealand) 


SUMMARY. Data are presented to examine alternative interpretations of errors of different lexical 
status which occur as responses to word reading tests. In a sample of 261 64-year-old children, the 
incidence of non-lexical substitution errors was positively associated with level of reading progress, 
while that of lexical substitution errors was not. Beginner readers of high reading progress 
apparently made more attempts at phonological mediation and were more susceptible to breaking 
lexical constraints in their responses than children of low progress. 


INTRODUCTION 


Word reading tests are widely used as a measure contributing to the assessment of an 
individual's reading attainment. Advice is commonly offered encouraging users to consider 
the types of error responses to the test, as well as the test score (Scottish Council for Research 
in Education, 1976, p. 2; Gilmore et al., 1981, р. 7). However, the research basis for this use 
of word test responses appears to be lacking. The purpose of this paper is to contribute to such 
research. 


Error responses to word reading tests may be classified into the following (mutually 
exclusive and exhaustive) classes: 


(a) Lexical substitutions: error responses which are lexical items — ‘‘real words". 
(b) Non-lexical substitutions: error responses comprising English phonological segments 
p which are not English lexical items — ‘‘nonsense words” or pseudowords, e.g. 
quing/. 
(c) No response: failure to make a response although the item is attempted. 


Errors of class (b) are of particular interest in view of the theoretical distinction between 
processes of word identification by (i) phonological mediation involving access to 
phonological segments, or by (ii) direct access, without phonological mediation (Smith, 1978; 
Baron, 1979). 


The processes which result in a non-lexical substitution error require access to 
phonological segments of words. Direct processing alone cannot suffice, as non-lexical substi- 
tutions are, by definition, items which are not in the reader's vocabulary. These errors would 
therefore represent unsuccessful attempts at phonological mediation. A lexical substitution 
may represent either an unsuccessful attempt at direct processing, or at phonological 
mediation which has produced a lexical response. Attempts at phonological mediation do not 
necessarily produce overt responses. Therefore ‘‘no response” errors may also result from 
either type of processing. 


There are two opposing expectations about individual differences in the incidence of non- 
lexical errors in the word reading test responses of beginner readers. It might be expected that 
higher progress beginner readers would only rarely make responses which break lexical 
constraints and hence would have a lower incidence of non-lexical responses than lower 
progress readers. This supposition is consistent with the claim that nonsense word responses 
are “‘indicators of potential reading difficulties" (Gilmore et al., 1981, р. 2). On the other 
hand the postulated developmental change from an initial *'logographic" stage to an 
“alphabetic” stage (Frith, in press) implies that higher progress beginner readers would make 
more attempts at phonological mediation, even at the expense of breaking lexical constraints. 
If the first expectation were correct, the incidence of non-lexical errors would be negatively 
correlated with level of progress of beginner readers; if the second were correct, it would be 
positively correlated. Data to distinguish between these alternatives are reported. 


Research Notes 217 


METHOD 


The sample was 261 children (130 girls and 131 boys), comprising all the English-speaking 
children aged 6-5 to 6-11 who were attending eight state schools in an urban region of New 
Zealand. All children had received school reading instruction for at least 12 months. This 
instruction emphasised the ‘‘book experience” approach (Clay, 1979) and did not include апу 
systematic teaching of correspondence between letter sequences and sound segments of words. 


A nationally standardised revision of the Burt Word Reading Test (Gilmore ef al., 1981) 
was administered individually to each child. This word reading test comprises 110 words 
arranged in order of difficulty which are read aloud by the subject, to a ceiling point at which 
10 successive errors are made. (The order of difficulty was revised in this standardisation.) The 
score of words correct was determined according to the standardised scoring procedure. 
Errors up to and including the ceiling point were classified as above. Failure to give an audible 
utterance comprising at least one English phonological segment was classified as ‘‘no 
response’’. If there were more than one substitution response to a word, the last response was 
the only one classified. Pilot trials showed that the error classification criteria were 
unambiguous. It may be noted that variant pronunciations consistent with the dialect of the 
child were accepted as correct, which is the standardised scoring procedure of the test. 


RESULTS 


Subjects scoring close to the lower limit of the test may tend to make relatively fewer 
responses of any kind, whether correct or incorrect. Hence, to consider this as a possible 
artefact in the correlations, a comparison was made of a restricted range of the subject sample 
with the full sample. In the restricted range sample, subjects with a reading test raw score of 
less than 10 were excluded. Subjects who were very high progress readers, with a reading age 
beyond 8-6 (raw score of 50 or more) were also excluded from the restricted range sample, as 
they tend to produce a large number of responses including error responses. With these 
exclusions there remained 224 subjects in the restricted range sample. In the full sample of 261 
subjects the mean and standard deviation of the number of words correct (see Table 1) was 
virtually the same as in the test standardisation sample for the 6-6 to 6:11 age group, mean 
28-2, SD 13-4 (Gilmore et al., 1981). 


TABLE I 


INCIDENCE OF TYPES OF ERROR RESPONSES AND CORRELATIONS WITH WORDS CORRECT, BY RESTRICTED 
RANGE AND FULL SAMPLES 





Product-moment 








Mean correlation with 
incidence words correct 
Type of response Restricted Full Restricted Full 
Words correct (test score) 28.3 28:2 
(9:7) (13:2) 
Errors within administration 
ceiling: 
No response 8:8 9-0 -0°11 —0-03 
(4-6) (4-9) 
Lexical substitution 6-0 5:8 –0:15 —0-12 
(3:5) (3:6) 
Non-lexical substitution 1-7 2:2 0-56 4- 0-63 
(3-2) (4-6) 
Total errors 16:5 16-9 +0-19 +0:46 
(4-0) (4-9) 


Note: Numbers in parentheses are standard deviations. 


In both the restricted range and full samples there was a moderate positive correlation 
between words correct on the reading test and incidence of non-lexical substitution errors. As 
the correlation remained in the restricted range sample it was not a mere artefact of upper or 
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lower limits of responses. Moreover, the correlation cannot be interpreted as merely indicating 
that high progress readers were making more errors of any kind than low progress readers. 
This cannot be the case as the correlations of words correct with the other types of errors, 
lexical substitutions and ‘‘no response" errors, were virtually negligible (Table 1). 


In both the restricted range and full samples, 39 per cent of the subjects produced one or 
more non-lexical substitutions. Hence the incidence of non-lexical substitutions was highly 
positively skewed. Although the product-moment correlation has been applied, the extent of 
skewness in the distribution of this variable requires some caution about the application. As 
an alternative measure of correlation, the biserial r was calculated by categorising subjects into 
those without a non-lexical substitution in their test responses and those with one or more. 
The biserial г had a value of + 0-65 for the restricted range sample and + 0:74 for the full 
sample. It may be noted that an analysis comparing error types as a percentage of words 
attempted was not feasible, in view of the proportion of subjects without a non-lexical error; 
neither can an analysis be justified comparing error types as a percentage of all errors 
(Thompson, 1984). 


There is the possibility that a non-lexical response is a ‘‘mispronunciation’’ of an item 
otherwise successfully identified by direct access, rather than an item representing a reader's 
unsuccessful use of phonological mediation. An attempt was made to classify those non- 
lexical responses compatible with this possibility but the number which could be so classified 
was negligible. The following are examples of the kind of non-lexical errors obtained (stimulus 
word in parenthesis): obtin (obtain), tingey (tongue), Joaney (journey), quing (quickly), belife 
(belief), comminced (commenced). There is also the possibility that the test words vary in 
regularity of letter-sound correspondence, with more irregular words among those attempted 
by the higher progress readers accounting for the higher incidence of non-lexical responses. 
Using Venezky (1970) as a guide, each test word was classified as regular or irregular. There 
were six irregular words among items 1-20, five among items 21-40, six among 41-60, five 
among 61-80 and six among items 81-100 of the test. The incidence of irregular words in the 
test does not vary with difficulty level of the words. 


DISCUSSION 


It may seem surprising that non-lexical substitution errors on the word reading test were 
associated positively with level of reading progress, while errors fulfilling lexical constraints 
were not. Higher progress beginner readers apparently make more attempts at phonological 
mediation even at the expense of breaking more lexical constraints than low progress readers. 
It is not plausible to suppose that high progress readers are just less successful at phonological 
mediation than lower progress readers. The higher incidence of non-lexical errors among 
higher progress readers can only plausibly reflect relatively more attempts at phonological 
mediation. Moreover, it is not merely that the high progress subjects are making more 
unsuccessful attempts of any kind. For instance, there was a negligible correlation between 
level of reading progress and the incidence of lexical substitutions. There is no justification for 
regarding lexical substitution errors as superior to non-lexical error responses to a word 
reading test. It is in fact shown to be the converse for 64-year-old children of a wide range of 
reading attainment, as would be expected if beginner readers progress developmentally from a 
“logographic” to an ‘‘alphabetic’’ stage. 
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ESSAY REVIEW 


Cas, В. (1985). Intellectual Development: Birth to Adulthood. London: Academic 
Press, pp. 480, £39-50. 


This volume by Robbie Case of the Centre for Applied Cognitive Science at the Ontario 
Institute for Studies in Education is a considerable tour de force. In it he summarises much of 
his earlier work, and presents a model of intellectual development which, not surprisingly, 
owes much to the influences of Baldwin, Piaget and Pascual-Leone, but which is in other 
respects highly innovative. Whether one wishes to embrace the model or not, it is not a 
contribution which should be overlooked. 


The work is an attempt to answer three questions: How do children’s minds function at 
different stages of their development? How do children make the transition from one stage to 
the next? And what steps can parents and educators take in order to optimise the 
developmental process? 


After lucid and critical accounts of the theories of Baldwin, Piaget, Pascual-Leone, 
Bruner and some of the Information Processing School, the author proceeds to outline his 
own model which attempts to integrate them. It is essentially based upon a reassertion of 
structuralism, but one which seeks to preserve the integrity of the microanalysis of individual 
task performances as well as the overall pattern. The centre of focus of this view of 
development is problem solving. The higher-order unit which is introduced to explain this 
activity is the executive control structure which has three essential components: representation 
of the problem situation: representation of the objectives (desired outcomes); and 
representation of the strategy to achieve these objectives. 


“As long as one focuses on the executive control structures that children possess at 
different points in their development, one ensures that the mental activity which precedes 
the assembly or application of any particular strategy, as well as the strategy itself will be 
described’’ (p. 69). 


In order to disentangle possible changes in executive structure from changes in problem 
complexity, a specific content domain is examined throughout the age-span, initially 
exemplified by comprehension and prediction of a beam balance. 


Case argues that observations of cohorts of children demonstrate that there are 
characteristics of thinking at different ages which professionals recognise, but that these are 
only amenable to description in terms of structures when task characteristics have been 
specified, not in the logicomathematical terminology of Piaget, but in terms of the level of 
domain-specific devices for establishing executive control. The difference is crucial. Unlike the 
Genevan assertion that the stages are characterised by different types of concept, i.e., domain- 
independent logical operations. 


“The present position is that each type of thinking or logical necessity leads to a 
different level of concept or ability, not a different type” (p. 240). 


The process of transition within this model is seen as the integration of erstwhile discrete 
control structures into hierarchically more sophisticated ones. This process has four identified 
components: (i) a novel sequence of existing structures is activated (schematic activation), (ii) 
a new outcome of the rearrangement is noticed (schematic evaluation), (iii) the process is 
recorded in memory for subsequent use (schematic retagging), and (iv) practised until it is 
firmly established (schematic consolidation). Here similarities to the work of Norman (1978) 
are noticeable, as well as a general acceptance that the mainspring of this process is a state of 
disequilibrium, in a form not dissimilar to that propounded by Piaget. Its major difference lies 
in the emphasis placed upon external events involving imitation or modelling, and of the 
importance of culturally devised intellectual tools such as language and counting, as possible 
sources of change. 


“If one considers only the core intellectual processes and sub-processes that have 
been hypothesised (i.e. . . . search. . . evaluation. . . retagging, and consolidation), 
one obtains a view that is highly rationalist, and compatible with Piaget's account of the 
stage-transition process. On the other hand, if one considers the full range of general 
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activities in which these processes can be elicited — both social and non-social — . . . 
one obtains a view that is. . . compatible with the accounts of Vygotsky and Bruner” (p. 
280). 


The thesis goes on to argue that the time-span of normal intellectual development 
suggests that a maturational process is also implicated, and this aspect of the model is 
developed from the familiar M-space studies for which Pascual-Leone and Case are perhaps 
best known. In the present text Case defines an individual's total executive processing space as 
comprising a portion available for activating new schemes (the operating space) and that 
required for the maintenance and/or retrieval of recently activated schemes (the short-term 
storage space). The maximum executive processing level (EPL) defines the overall capacity or 
ability of the system. A range of experiments are then cited to test the prediction that, within 
any domain, the capacities of children to solve problems may be defined as the sum of the 
operating space (OP) and the short-term memory space (S), and that the critical values at ages 
4, 6, 8 and 10 years are 1 [OP]; 2 [OP + 1]; 3 [OP + 2]; and 4 [OP + 3]. 


Short-term storage space is seen as essentially under the control of similar maturation 
factors to physical development, but prediction here is confounded by the possibility of 
memory strategies such as ''chunking". The physiological analogue of maturation is 
somewhat speculatively identified as the increase in myelinisation of the nervous system. 


In the book's final section the author grapples with the application of his theory to 
education in a quite detailed manner. In this respect it is refreshingly different from Genevan 
theory. Five general procedures for developing the child's problem-solving abilities are 
outlined. These are (i) specifying goals, (ii) analysing the executive control structures of skilled 
adults, (iii) analysing the developmental precursors of those adult structures, (iv) creating a set 
of educational activities to maximise the child's engagement in the task, and (v) implementing 
the curriculum in a form which permits individualisation of instruction in terms of the 
learner's initial level in the domain, the balance of independent to socially regulated activity, 
and the degree of simplification, prompting and monitored practice needed. 


The breadth and detail of the studies are such as to almost defy cursory comment. At the 
most superficial level one might be tempted to say that the educational prescriptions are hardly 
revelationary, and this is certainly so. However such comment does not detract from the 
theory per se. Indeed, the extent to which a new theory offers a plausible explanation of what 
has been seen by many as the best intuitive style of practising teachers may be said to 
commend it rather than castigate it. The Greeks and Romans had some rules for memorising 
which still stand the test of time; information processing has suggested some reasons for why 
this may be so. 


With reference to the problem-solving paradigm within which the theory is couched, two 
issues at least seem in need of closer scrutiny. The first relates to the hypothetical executive 
structure. Whilst it is undoubtedly the case that the examples cited can be analysed into the 
three components, there seems at present no objective way of determining that by so-doing the 
Observer is modelling internal processes. Here the realism of Piaget's philosophy is echoed 
loud and clear by Case. It is perhaps salutary to remember Schiller’s critique of Kohler 
(Schiller, 1952) or Guthrie’s admission that, in the final analysis, what matters is not the 
observer’s view of the stimulus, but what is means to the subject. By the same token, how can 
the observer determine that any one of the three components is represented in the mind of the 
subject in the way that Case suggests? A similar point constitutes the second issue. How 
reasonable is it to suggest (pp. 82-117) that the visual tracking of blocks on a simple balance by 
babies in the first four months of life and the exploration of issues of ratio, mass and distance 
compensations, etc., constitute ‘‘the same domain’’ except in the most trivial manner. Whilst 
whole-heartedly subscribing to many of the criticisms of Piaget which Case presents, the 
difference between the two scenarios would seem to be more aptly described as underpinned 
by fundamentally different logicomathematical structures than as similar, with the one an 
elaborate end-on chain of structures superseding the other. 


This is a complex, scholarly book which represents a serious and laudable attempt to steer 
a course between the fortuitous exigencies of the environmentalist lobby and the straitjacket 
of Piagetian structuralism. Insofar as it embodies a genuine empirical approach in which 
predictions are experimentally tested, it is a valuable contribution. Whether, in the long run, 
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experimenters will find the paradigm helpful in a variety of disparate areas remains to be seen. 
They should certainly try. 


GEOFFREY BROWN 
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BOOK REVIEWS 


BanLow, G., and Hint, A. (Ed.) (1985). Video Violence and Children. Sevenoaks: 
Hodder and Stoughton, pp. 182, £4-95 pbk. 


This book reports research on video nasties, initiated by a Parliamentary group with the 
clear and well-meaning intention of creating legislation to ban them. As the preface says, 
““obviously the first step was to promote a research project which would find out what videos 
children were seeing throughout the country". The underlying hypothesis was that the 
research would make an '*overwhelming" case for the banning of video nasties. 


A number of chapters spare time to attack those who do not agree with their assumptions 
and assume that these ‘‘ostensibly distinguished sources’’ are quite incapable of listening to 
alternative hypotheses. It is quite clear that the contributors in this book are equally unwilling 
to listen to those who would doubt their moral crusade. This book is, in fact, another example 
of the politics of research on television and a good example of why we seem to know so little 
about human motivation and response to circumstances. The politics of television research 
has been often written up. It shows how few people have ever been convinced by evidence and 
shows how rare is an example of people changing their minds. The most worrying question is 
why research should be used in this way by those with clearly closed minds. The evidence in 
this book is an example of the worst aspects of educational research with its pseudo-scientific 
methodology and lack of curiosity. The moral crusade is understandable in one sense. It is a 
very odd reflection of human beings that such nasty videos are made. But this book does not 
say anything new about children’s response to the videos. 


The idea of the research project was to prove how many videos are seen by children (a 
finding that was challenged by those who inspected the methods) and then to assume that what 
children see is copied. This assumption is repeated by a number of distinguished people who 
put their ‘‘academic reputations” behind the report as if such assertions could overcome all 
the doubts expressed in the very rough passage the research itself received. More tellingly, the 
assumptions about the damage done to children is supported by a curious barrage of statistical 
research in which a statement saying that 50 per cent of psychiatrists said yes and 50 per cent 
of psychiatrists said no is almost invariably followed with a figure to illustrate the same case 
and wherever possible a table and a statement about the statistics. It is as if such paraphernalia 
made the findings ‘‘empirical’’. 


The main assumption is that children are ready to copy what they see in terms of sex and 
violence, because the camera angle makes the acts depicted seem real from the point of view of 
the protagonist. This is a simple hypothesis which would be interesting if researched but 
unfortunately none of the authors seems to have read very widely in the literature, let alone 
the work of Berkowitz on justified aggression even though this is so close to what they are 
concerned about. They still, instead, go on flogging the dead horse of Feshbach's 1961 paper 
on cartharsis as if discounting that paper overcame all the opposition to their point of view. 


The rest of their work is anecdotal. This, of course, can work both ways. As rapidly as 
the chapters give a case of a child staying awake at night as a result of a video nasty so a 
contrary anecdote can be produced showing how children find the ‘‘stunts’’ of video nasties 
extremely amusing. The book, therefore, has a clear purpose and in achieving its legislation is 
is already fulfilling that purpose. The difficult question is whether we are concerned what 
educational psychology should make of such a use of research and the assumption that one 
needs to find evidence to support such an assertion. The concern with video nasties is 
absolutely understandable but as long as people show no interest in open-minded research we 
are not going to get very far in understanding human nature, including the worst aspect of 
human nature demonstrated by this book which is not only the depravity of producing such 
videos but the clinging to a belief in the face of any contrary evidence and in using research to 
foster that belief. This book demonstrates some missionary zeal in stating that all the evidence 
points in one direction; which is as unhelpful as the opposite point of view. Perhaps the 
following quotation can speak for itself in style as well as attitude: 


“In a situation of rapid and radical social change such as we have today, the established 
social values underlying the norms that form the foundational bedrock upon which the whole 
social system rests are being challenged by a multitude of complex interacting forces which are 
evidenced in the phenomenology of behaviour at both an individual and a societal level. The 
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young are particularly susceptible to the effects of the instability surrounding foundational 
social values" (p. 163). 


The subject of children's response to videos, like the subject of children's response to 
television, is an important one and it is a pity that research has been used to close down an 
argument rather than pursue it further. There are several interesting points that the authors 
make which are worth taking up, but that would open the possibility of a dialogue and that is 
not what they seem to want to undertake. Many of the points, for example about the role of 
parents, contradict each other from one chapter to the next, but the underlying difficulty is 
that with lack of background reading there has also been a lack of curiosity about what 
actually goes on in children's minds. This book is another example of the way research is used 
to prove a point and should be read as a good or bad example of the psychology of research. 
The tone and the method of this survey are very telling, with wild assumptions followed by a 
lot of statistics. Perhaps it helps one understand the minds of the authors. More worryingly, it 
shows what people want of research. 

CEDRIC CULLINGFORD 


Bennett, N., and Desrorces, C. (Eds.) (1985). Recent Advances in Classroom 
Research. Edinburgh: Scottish Academic Press, pp. 201, n.p. 


This book is the second in the Monograph Series recently instituted by the British Journal 
of Educational Psychology. This volume has been edited by Neville Bennett and Charles 
Desforges who have included twelve specially written articles, together with a concluding 
summary, on the theme of classroom research. The editors justify the need for this book by 
pointing to ‘‘an increasing fragmentation of research focus and theoretical orientation’’ 
across this research area. There are three main sections in the book. First come five chapters 
on curriculum content, followed by six describing aspects of the classroom context, with the 
final two chapters having an integrative function. The authors are all well chosen to provide 
authoritative and up-to-date reviews of their respective areas, and the chapters are generally 
well-written and accessible to a student audience. 


The 11 main chapters follow a common structure, with the authors providing a fairly 
brief general review of research on their topic, followed by a more detailed summary of a 
particular project with which they have been involved. However, the balance between these 
two parts does vary considerably with some of the chapters having such brief reviews that the 
general research trends are rather sketchy, while in others the reader is left with only a vague 
impression of the specific project. There is also considerable variation in the length and utility 
of the concluding summaries. In a book of this type it is particularly valuable to allow the 
reader to browse through summaries or abstracts before reading full articles, and the 
variability of the summaries is thus disappointing. It is, of course, notoriously difficult to 
persuade academic colleagues to follow a common pattern, but perhaps the editors might have 
been firmer in this respect. 


The content of the chapters themselves is, however, of a high standard, and the initial 
overview together with full sets of references will prove particularly valuable both to students 
and to researchers from other areas. The chapters cover the curriculum areas of reading, 
writing, mathematics, science, and integrated studies, while the context chapters describe task 
matching, grouping, dissention, decision-making, behaviour modification, and gender 
differences. One of the integrative chapters explores the problems in linking theory and 
practice, but this and the concluding chapter leaves the reader with a certain ambivalence. Ted 
Wragg stresses the crucial importance of effective transfer of theory into practice, and yet 
both he and the editors conclude that progress on this front has been limited. The editors draw 
attention to the strides that have been made in developing more effective theories and research 
methods in classroom research, and see some signs of the existing narrowly based theories 
developing into more comprehensive ones. But it is difficult to share their optimism. 


The various chapters show curiously little overlap in conceptualisation. The researchers 
seem to have chosen one aspect of the curriculum or the classroom context with little 
awareness of the wholeness of the everyday reality experienced by teachers. Teachers reading 
their conclusions may find it difficult to recognise the partial realities described by the 
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researchers. This is, of course, a real dilemma. The researcher is forced to restrict the focus to 
gain control and precision, but in the process there is a tendency to ignore those parts of the 
classroom process excluded from their individual concerns. Thus a discussion of ‘‘pupil 
strategies" concerns itself with dissenting behaviour and virtually ignores on-task behaviour, 
while other analyses focus on the task or deal with both on- and off-task activities but only in 
the specific context of group work. Within science education there is a recognition of the 
effects of “naive” concepts, meaning concepts derived from everyday experience rather than 
from empirical data and abstract conceptualisation. Yet this idea might well have been 
usefully employed in the discussion of putting theory into practice. Surely one of the main 
problems in interesting teachers in any form of educational research is that the 
conceptualisations of the researchers do not overlap sufficiently with the ‘‘naive’’ 
explanations of the teachers. Classroom research, in fact, is the most likely to achieve effective 
communication, particularly when the concepts are “‘grounded”’ in the experiences of pupils 
and teachers and illustrated with quotes and observations. Yet it seems from this monograph 
that we still only have the potential, rather than the actuality. 


This discussion should not be seen as a criticism specific to this collection of articles. It is 
perhaps because they contain such rich and relevant conceptualisations that the reader is 
looking for a fuller integration of the various partial descriptions of classroom teaching and 
learning. In many research areas there are now reports of meta-analyses which consolidate 
research findings on a particular topic. What seems to be needed in classroom research is a 
different form of meta-analysis in which the most powerful concepts describing different 
aspects of the classroom are brought together in a way which presents to the teacher 
something closer to ‘‘recognisable reality” and so provides something approaching a heuristic 
model of the classroom. This monograph would provide an admirable starting point for such 
a development. 

NOEL ENTWISTLE 


CroLL, P., and Moses, D. (1985). One In Five: the Assessment and Incidence of 
Special Educational Needs. London: Routledge and Kegan Paul, pp. 
xiv + 168, £12-95. 


This book is a report of a survey carried out at the University of Leicester with DES 
funding. It is, as the title indicates, a post- Warnock and 1981 Education Act study and hence 
must be seen as a statement of what was found during a period of considerable change in the 
profession. It reports on activities in a representative sample of 10 local authorities. 


There are two distinct aspects to the research: the first surveys 428 junior teachers 
distributed across 61 schools, the other employs systematic observation in 34 junior school 
classrooms for the 8-9 year-olds. The findings are therefore of less immediate relevance to 
teachers in secondary schools where organisation and procedures may be rather different, and 
where the individual teacher may fulfil a less crucial role for any particular child. 


Teachers identified 18:8 per cent of their pupils as being in need of special provision. Of 
these the preponderance (15:4 per cent) were thought to have learning difficulties, 7*7 per cent 
behavioural problems and 4:5 per cent physical problems — although 40 per cent of those 
cited were deemed to have multiple problems. The categories were not tightly specified, and 
therefore the distinctions between behavioural problems, discipline problems, and emotional 
problems may be more apparent than real in the data. Furthermore, the authors' observation 
that the overall percentage is close to the eponymous proportion may constitute less of a 
confirmation of the Warnock estimate than a recognition that the teachers felt that they 
should aim for that authoritative ratio. Not unexpectedly the boys outnumbered the girls in a 
ratio of 2:1 (although the totals in Table 3:7 would seem to deny this). 


Investigating the distribution across schools the researchers found this to be quite even, 
with little evidence of pockets of special needs children in some schools and none in the others. 
However, in a later part of the analysis it is clear that the relative judgments of individual 
teachers are coloured by the overall level of competence in their class. Those with high ability 
groups identified 50 per cent of the children who, on the Spar Reading Test, were between one 
and two years retarded. In the weakest group this figure was down to 19 per cent. 
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Whilst happy to support the principle of integrating the special needs child, when faced 
with the possibility of teaching those with a specified handicap half of the sample often 
declared that they would be reluctant to accept them, or would actually refuse. Previous 
experience of children with the handicap ameliorated the position a little. 


The practice of formal testing was found to be very widespread, with over 90 per cent 
testing for reading at least once every year. Word recognition tests were by far the most 
popular instruments, accounting for one third of all tests used. The authors introduce a 
discussion on the problem of incompatible teacher assessment and test result by referring to 
“two camps” with preferences for discounting one or the other. Analyses then show clearly 
that teachers would not subscribe to such a crude ‘‘straw man” dichotomy, but take a more 
rational stance. This is heartening news, but as they can use tests sensibly it seems a pity that 
their choices are so restricted. (N.B. I shall continue my campaign to afford Peter Pumfrey the 
proper spelling of his name!) 


On the monitoring of individual children the reader is told that a 10-second sampling for 
four minutes per subject was used. No control over the time of day or part of the week seems 
to have been exercised, nor are any inter-observer reliabilities recorded for the categories of 
activity observed. Hence this section of the book is deficient in technical detail. The study 
suggests that the overall distribution of time into lessons, individual work, etc., is almost 
identical for all children, although those with special needs spend less time on task and receive 
more teacher attention. In spite of this attention by the individual teacher 66 per cent of 
teachers felt that more help was needed, 30 per cent received no help from colleagues or 
support services and, amazingly, when a remedial teacher was available only 40 per cent of the 
special needs cases were discussed with her. The Warnock goals still seem some way ahead. 


There are no real surprises in this book, but there is a great deal of detailed information. 
For those who need to know about the general professional dispositions towards special needs 
children in primary schools this will be a useful source of reference. 


GEOFFREY BROWN 


Deci, E. L., and Ryan, К. M. (1985). Intrinsic Motivation and Self-Determination 
in Human Behaviour. New York: Plenum Press, pp. xv + 371, $29-50. 


Judging by the quotations given on the dust cover, this book has attracted considerable 
praise in the authors’ homeland. And I can see the reason. It sets out to do two things, and 
succeeds (with minor reservations) i both. Firstly, it contains an excellent survey of the major 
psychological theories of motivation spawned during the last half century, and secondly it 
shows how each of these theories can be subsumed within the parameters of the authors’ own 
model. If psychologists read it, and take it seriously, then it is difficult to see how they can 
hang on to their own essentially fragmented and partisan approach to what it actually 1s that 
makes human beings get up off their backsides and relate to the world around them. But then 
I have been making claims like this for important new books in psychology for some years, 
and their impact has always been in the event much less than I would hope. Psychologists, like 
most other people, have the habit of only reading and believing the things they want to 
believe. And, in spite of the best efforts of Messrs Deci and Ryan, doubtless we shall still be 
assailed in the future as in the past by psychologists who claim exclusive rights for operant 
theory, or for information processing, or for dynamic theories or for one of a dozen and more 
others. 


Deci and Ryan take as their central contention that much motivation is organismic, that 
is, it stems from an active organism who acts upon the environment in order to satisfy its 
needs. Motivators are therefore primarily ‘‘the spontaneous internal experiences that 
accompany behaviour’’, whether that behaviour be rock climbing, painting pictures or one of 
the ‘‘countless other things for which there аге no obvious or appreciable external 
rewards. . . . Intrinsic motivation is the energy source that is central to the active nature of 
the organism". Having made the point, and backed it up with careful argument and research 
evidence, the authors then see the next problem as “һом to conceptualize this new energy 
source and how to integrate it into psychological theory". The method they employ is to 
survey, chapter by chapter, existing theories such as attribution theory, operant theory, self- 
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actualisation, ego theories, information processing, attitude theories and the like, and show 
how each in itself is not so much wrong as incomplete. Each of these theories tells us a little bit 
about motivation and how it operates under certain circumstances, but none of them is equal 
to the task of giving us the whole picture. Only when we put them together within the context 
of what might be called a *'grand" theory do they begin to give us the kind of inclusive 
information we want. 


This ‘‘grand’’ theory is the one advanced by the authors themselves, and one which 
makes a clear distinction between the organismic or intrinsic motivation, to which I have 
already made reference, and the extrinsic sources of motivation defined by Deci and Ryan as 
prompting behaviour ‘‘where the reason for doing it is something other than an interest in the 
activity itself’. Concentrating their book primarily upon intrinsic motivation, they see it as 
linked essentially to what they call ‘‘self-determination’’, that is ‘Shuman functioning that 
involves the experience of choice, the experience of an internal perceived locus of causality”. 
Mild words you might think, but consider their full implications. If we accept them, we come 
perilously close to acknowledging free-will. For, even though the authors leave in the word 
**perceived'' to cover their backs so to speak, it is not easy to see how a perceived locus of 
causality is any different from an experienced locus of causality. And before we have quite 
recovered from this shock we find the authors saying later in the book that ‘‘the evidence is 
indisputable that intrinsic motivation exists and that it involves non-tissue-based and drive 
independent means’’. Since we all know that the biological definition of ‘‘tissue’’ covers all 
cells and cell products in the human body, this looks suspiciously as if the authors are telling 
us no psychological theory of motivation grounded in biology is of use in explaining 
motivation of the intrinsic type. The true iconoclasts amongst us might go further and say that 
the authors are appealing for a non-material interpretation of mind, but perhaps this raises 
too many hackles to be effectively pursued in a short book review. 


Let me emphasise again that the authors base their views not upon flights of fancy but 
upon weighty evidence. They review over 200 studies taken from every major field of 
psychology, and synthesise these into their grand theory. They are methodical and scholarly 
throughout, and it is hard to fault at least their basic thesis. It is said that in science theories 
are not displaced by facts that fail to fit them but by the advent of superior theories. There are 
no prizes for naming the greatest number of scientific theories that are still hanging around, in 
spite of impressive evidence discounting them, simply because no one has yet come up with 
‘something in place of them. But at least this need no longer be true of motivation. The authors 
have put in psychologically respectable form what we each of us feel about our own lives. 
Namely that we are self-determining individuals, with much of our motivation inexplicable in 
terms either of internal drive theory or external reinforcement. 


Now for the minor reservations to which I referred in my first paragraph. A surprising 
omission in the very full coverage of existing theories of motivation provided by the authors is 
Apter's theory of psychological reversals (e.g., Apter, 1982). This is doubly surprising in that 
Apter's theory would appear to fit almost exactly, at a number of key points, the authors' own 
model. A second reservation 1s that Deci and Ryan do not seem to have fully thought through 
the implications of their model for extrinsic motivation. This is perhaps excusable in that their 
concern is primarily with intrinsic, but they do make numerous references to extrinsic, without 
seeing the link between the two which their own theorising necessitates. Briefly this is that if, 
as Deci and Ryan claim, ‘‘theoretical accounts of the integrative process of development 
require the postulate of intrinsic motivation as the energy source for integration” then, ipso 
facto, external events can only motivate us when they have become internalised. That is, 
extrinsic motivation only works if it is able to produce an internal experience strong enough to 
prompt us into action. It has no power in and of itself, and only gains this power to the precise . 
extent to which intrinsic motivation becomes mobilised. Something of Deci and Ryan's 
distinction between extrinsic and intrinsic motivation thus necessarily collapses, transferring 
extrinsic motivation from an ‘‘out there’’ experience into an inner, phenomenological one. 
This again would seem to fit neatly onto Apter’s motivational model, which makes it even 
more disappointing that the authors do not seem to be familiar with his work. 


There is nothing particularly new in any of this of course. Back in 1890 William James 
was urging that ‘‘interest’’, which could take many forms, played the major role in directing 
our attention towards the external world. But the importance of Deci and Ryan's work, 
whether directly expressed or through implications such as the one identified in the previous 
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paragraph, is that they are able to put general concepts such as ''interest'"" into precise 
psychological language, and present them within the concept of a research-based discussion 
sufficiently tightly drawn to engage even the most sceptical reader. At the risk of labouring 
this point too much, let me give just one further quotation. Intrinsic motivation, the 
“energising” source behind behaviour, produces “а spontaneous, natural tendency toward 
differentiation and integration that does not necessitate prods and pushes from the 
environment’’. Nothing exceptional about that in the mouth of a Jungian, but from two 
American psychologists reared and working in an orthodox research tradition, it comes as a 
major revelation. 


If the book finished here it would still be of prime interest to educational psychologists, 
since motivation is at the heart of the educational process. But having made their point, the 
authors then go on to devote nearly 100 pages to the practical applications of their model. 
They look at psychotherapy, sport, work, and above all education, bringing out a range of 
valuable research data and suggestions. Not surprisingly (but very encouragingly) their work 
shows that children learn best in a stimulating environment free from the pressure of grades, 
rewards and controls. Further, the presence of external reinforcers even seems to have an 
actively inhibiting effect in the field of conceptual learning. All good stuff, and not a hundred 
miles away from the sermon that many educationists have been preaching since the 1930s. It 
would be пісе to be able to say that Intrinsic Motivation and Self-Determination in Human 
Behaviour will at last get the message home. And even nicer to predict that the book will 
change radically the way in which psychologists of all descriptions conceptualise motivational 
factors. But somehow of course I doubt it. Which brings me back to my point about 
psychologists and the habit of only reading and believing what one wants to believe. 


DAVID FONTANA 


REFERENCE 
APTER, M J. (1982). The Experience of Motivation. London: Academic Press. 


Driver, R., Guesne, E., and TiBERGHIEN, A. (Eds.). Children's Ideas in Science. 
Milton Keynes: Open University Press, pp. 208, n.p. 


This collection of eight reviews deals with the conceptual schemata which students use in 
order to respond to questions about physical phenomena. These include, ‘‘Light’’ (Edith 
Guesne), "Electricity in Simple Circuits" (David Shipstone), ‘‘Heat and Temperature” 
(Gaalen Erickson and Andrée Tiberghien), ‘‘Force and Motion" (Richard Gunstone and 
Michael Watts), *'The Gaseous State?’ (Marie Geniéve Séré), ‘Тһе Particulate Nature of 
Matter in the Gaseous Phase"! (Joseph Nussbaum), ‘‘Beyond Appearances: The Conservation 
of Matter under Physical and Chemical Transformations” (Rosalind Driver), and ‘‘The Earth 
as a Cosmic Body” (Joseph Nussbaum). It is an interesting collection, likely to make teachers 
think more about learning and in some cases erode their confidence in the validity of their own 
constructs of some physical phenomena. 


The models which physicists currently use to make sense of natural events with 
parsimony, generalisability and predictability have evolved over centuries and benefited from 
the contributions of great imaginations. It would be remarkable if students, given the chancy 
contact with the natural world before formal instruction and the limited experience of science 
courses, were to acquire these concepts fully developed within the span of their school careers. 
Nevertheless, non-physicists including children live in a world which is not totally chaotic, 
where things both man-made and natural frequently behave in predictable ways. Such 
empirical regularities presumably constitute the building bricks of the lay-men's constructs. In 
science lessons not only is the range of pupils’ experience of the physical world extended but 
questions are raised which have no referents in their currently held conceptions. Thus the 
process of accommodation may begin. 


These studies attempt to map the ways in which relatively untutored minds try to make 
sense of physical phenomena and the effects of instruction on developing schemata. The 
methods of investigation used are, initially at least, idiographic: pupils are presented with 
data, often at first hand, and asked to explain or to answer questions put to them by the 
investigator. 
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From the answers the investigator attempts to infer any construction the subject has used 
to make sense of the data. It is usually the case that these constructs, if they exist at all, differ 
from the constructions of 20th century physical scientists. Some are disconcertingly labile 
depending on context. Others seem to be very resistant to change, given currently popular 
teaching methods. There are often signs of a lack of logical internal consistency between sub- 
constructs. 


The search for patterns among these ‘‘alternative constructs", which is often 
accompanied by a methodological shift to pencil and paper items given to sample of pupils, 
has led to comparisons between subjects of different ages, from different cultures and 
between some pupils’ constructs and those of natural philosophers of earlier times. 


The book is addressed to teachers ‘‘to present findings which will be helpful in enabling 
the teaching of science to be better adapted to children’s understandings’’. In this respect, 
although most of the contributing authors include some reference to implications for 
teaching, and there is an admirable summary on the penultimate page, these issues were not 
explored as exhaustively as one might have expected. 


The problems faced by teachers and researchers who wish to probe ‘‘children’s 
understandings” of physical phenomena in particular are formidable. One is the influence of 
context which the authors recognise: ‘‘(the problem of) . . . devising ways of probing 
thinking which enables us to sort out the status of the responses we obtain; to distinguish 
between those ideas which play a significant part in the thinking of an individual or a group 
and those which are generated in an ad hoc way in response to the social pressure to produce 
an answer in an interview or test situation". 


Another problem exists in the mind of the interviewer. She comes to the interview with a 
working vocabulary and her own construct of the phenomenon under investigation and a 
partial knowledge of constructions she has inferred from other interviews. Misinterpretation 
of children's statements and possible distortions of their meaning are likely under these 
conditions. Linguistic considerations are critically important. Conversational transactions can 
yield data only when the overlap of shared meanings is above some irreducible minimum. 
Vague, undifferentiated notions which may be connoted by the words children use are 
distanced from the denotative technically precise vocabulary of the physicist. Some of the 
contributing authors evidently recognise this but some pay insufficient attention to this aspect 
in their haste to construct pupil's alternative constructs. 


With these reservations, this collection of studies is fascinating and potentially very useful 
for the audience to whom it is directed. As teachers read snippets of interviews and try to 
answer the questions their own constructs will come under introspective scrutiny. The 
responses of pupils and students and in some cases other teachers will alert them to the evident 
inadequacy of expository teaching accompanied by verificatory laboratory work, and cause 
them to reflect on alternative methods which might facilitate conceptual exploration by 
pupils. 

Production of the volume left much to be desired. There were some proof-reading errors. 
Diagram (Figure 3-8 (c), p. 45) in David Shipstone's chapter is missing and the quality of the 
figures is variable. 

JIM EGGLESTON 


Durkin, K. (1985). Television, Sex Roles and Children. Milton Keynes: Open 
University Press, pp. 148, n.p. 


Many educationists are currently concerned with the influence of sex roles on a child's 
educational career, and are also very much aware of the influence of various social forces 
outside of the school. Durkin’s book examines one particularly pervasive form of such 
influence, television. 


The book is neatly structured. It begins with a chapter defining some basic terminology 
and moves on to examine the content of television in terms of sex role portrayals and the 
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genetic, social-learning and cognitive-developmental approaches to theorising about sex role 
acquisition. The coverage of these positions is reasonable but one suspects that it might be 
both too brief for the novice reader and too lengthy (even long-winded) and light-weight for 
those already familiar with this work. 


In Chapter Four, Durkin turns to the main concern of the book, the effects of television 
viewing on children’s sex role development. Given the extensive sex-typing of behaviour that is 
found on television Durkin is at pains to demonstrate that the often cited ‘‘linear effect” 
hypothesis (the more a child watches television the more he or she will be sex-typed) is not 
substantiated by the evidence. This point is perhaps laboured. However, as Durkin himself 
points out, such simplistic ideas often seem to hold currency for much longer than they 
deserve in thinking about both the effects of the mass media and about the nature of sex roles. 


More interestingly, Chapter Five examines the evidence relating to the more detailed 
aspects of the child’s understanding of the content of television. Chapter Six, looks at the 
evidence concerned with the effects of programmes that contain a message that runs counter 
to the conventional sex role stereotypes. Durkin concludes that such messages are likely to 
have some effect on children’s attitudes, at least in the relatively short-term, but is careful to 
point out that this does not necessarily imply that television might have a general impact upon 
sex role development. In his conclusion Durkin argues for the adoption of a developmental, 
social psychological perspective to the study of the effects of television on children that would 
emphasise the role played by the children themselves in their own socialisation (by actively 
searching for relevant information and incorporating it into their existing conceptions) while 
also emphasising the role played by the social context in which television watching takes place. 


While the conclusion is persuasively advanced, and is one that many current readers 
would be inclined to agree with, it does not make for a very convincing point at which to end. 
To be told that the analysis so far enables us to ask better questions may well be correct and 
important, but it 1s not likely to satisfy many readers. Durkin himself cannot be held entirely 
responsible for this as he can only review the literature as he finds it. He might however have 
had more to say about the ways in which useful advances might be made. 


A major problem with the existing literature is that the focus for research (the acquisition 
of sex roles) is too broad and diffuse. Sex role identification becomes interesting, for the most 
part, when it displays itself 1n particular behaviour patterns in specified circumstances. In 
education for example, sex roles are relevant in as much as they lead girls and boys to hold 
specific ideas about their competencies in respect to defined areas of an actual curriculum. 
The effects of television might be better understood if researchers took greater care to analyse 
the particular aspects of behaviour that might be influenced and the contexts in which that 
influence might be noticed. The positive effects found for counter-convention portrayals 
might owe much to the more specific definitions of the target behaviours that are of concern in 
these studies. 


In conclusion, this is a clearly written book that can be readily read and absorbed. 
However, the state of affairs that it describes 1s too inconclusive and too lightly analysed for 
the book to have much impact on either the newcomer or the already well-informed. 


COLIN ROGERS 


Fontana, D. (Eds.) (1984). Behaviourism and Learning Theory in Education. 
Edinburgh: Scottish Academic Press, pp. 195, #7:50 pbk. 


The purpose of this Monograph, according to the editor, is ‘Чо survey important aspects 
of the contribution currently made by behaviourism and learning theory within formal 
education and within those fields, such as counselling, closely associated with it". The text is 
divided into two parts, ''Behaviourist and Learning Theory in the Classroom" and 
*Behaviourism and Learning Theory in the Wider Educational Context". Each part is 
followed by editorial comment. 


The opening chapter by Blackman, which is rather surprisingly included in the first part, 
provides a stimulating introduction to the past, present and possible future, and briefly 
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explores a number of interesting issues. Most notable amongst these are the implications of a 
move towards cognitivism in comparative psychology and the impact of cognitive learning 
theory upon behavioural technology, the latter being a theme which is explored further in 
subsequent chapters. The view of behaviourists denying that humans have a mental life or 
private experience is attacked with apparent enjoyment. 


The chapters which follow in the first part of the text have a difficult act to follow. The 
next three are concerned with the management of social behaviours, precision teaching and 
behavioural objectives, and sequenced in this way they do tend to complement one another. 
The final chapter, entitled ‘Тһе Observation and Classification of Classroom Behaviours’, 
broadens the scope by examining various approaches to observation, but does not tackle the 
difficulties inherent in observation and classification, a topic which the reader might 
reasonably expect to be addressed under such a title. The second part of the text consists of 
chapters on behaviourism in the closed community, behavioural guidance and counselling, the 
behavioural approach to language development and cognitive behaviourism. 


The editorial comment which follows each part is brief and pointed. The editor treads 
warily; there is unavoidably a tendency to tell the reader what to conclude from previous 
chapters, but more importantly, the comments bring in additional information and add a 
further coherence to the text. ' 


The structure of the Monograph reflects its purpose and the topics included are well 
chosen. How successful the text is depends naturally on the quality of the individual 
contributions. These vary considerably along a number of dimensions, most notably in the 
ratio of references to discussion. There is a tendency for some chapters to bombard the reader 
with references to the extent that it is felt that the only way of agreeing or disagreeing with the 
authors’ views 1s to chase up the primary sources. Conversely, in chapters in which discussion 
lengthens, there is a perhaps natural tendency for elaboration of the authors’ own research to 
the exclusion of other pertinent work. Two chapters which merit exclusion from these 
criticisms, however, are those by Kiernan on language development and Vingoe on cognitive 
behaviourism. Kiernan’s chapter stands out particularly as an informative, readable 
examination of behavioural theory in relation to language development. 


In more general terms, this 1s a text which can be strongly recommended to behaviourally 
inclined psychologists. They should find the varied applications of behaviourism which are 
presented add breadth to their thinking and, I suspect, some of the viewpoints expressed will 
challenge their previously held beliefs. The less behaviourally inclined may find much of the 
text too densely packed with information to be an easy read, yet a little perseverance should 
reveal that, in the words of the editor, ‘‘behaviourism has a good deal to offer within all areas 
of education’’. 

ALEX HARROP 


Francis, H. (1985). Learning to Teach: Psychology in Teacher Training. Lewes: 
Falmer Press, pp. 202, £6-95 pbk. 


This is an important book, which deserves a wide readership within the teacher education 
community and amongst others whose decisions have power to influence teacher training 
programmes. It is important, partly because it speaks to the contemporary situation, in which 
the position of psychology (along with the other disciplines of education) in such programmes 
is being challenged, both for reasons which are entirely proper and for reasons which derive 
m the dogma and faith that have all too frequently dominated discussion of teacher 
education. 


It is important also because its contributors, while subscribing to no consensus view, 
clarify the uneasy relationships between psychology and the congeries of activities that make 
up the professional preparation of teachers — and in so doing expose some of the tensions and 
possibilities within the discipline itself. 


The book is offered as ‘‘a serious contribution to curriculum development in teacher 
training’’. Essays from a dozen teacher educators are grouped into three sections: the training 
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and development of teachers; becoming a teacher; and challenges for teachers and trainers. 
The adequacy of these three section titles to convey the distinctive thrusts of the various essays 
is not great, but Hazel Francis’ linking passages are helpful. And it is unlikely that the book 
will be read as a continuous text. 

The first two essays set the scene quintessentially, firstly through an historical account of 
the part played, and claimed, by psychology in the formation of teachers, and secondly 
through an analysis of student reactions to psychology courses and of the qualifications and 
experience of those who teach psychology on ITE courses. Margaret Sutherland’s essay 
highlights the early aspirations expected from the ‘‘marriage’’ of education and psychology 
(the latter bringing as its dowry a science that would establish ‘‘laws of human development 
and behaviour’’), followed by a **break-up'' beginning in the 1950s when other suitors such as 
sociology, philosophy — and later politics — urged the view that education was too complex 
and contested a notion to be helped uniquely by psychology. Bernbaum's essay takes the story 
forward by showing how the ‘‘action-orientation’’ of many teacher training staff and the 
preferred reading of students both suggest a tension that has been unresolved in many ITE 
courses, a tension between those who are pre-occupied by the immediate and unavoidable 
demands of schools and those who properly resist the view that psychology can give simple 
recipes for classroom success. 


Brahm Norwich in his essay on professional socialisation picks up a different strand of 
the same broad problem of establishing a credible model of teacher education. The 
dissatisfactions felt by students with many psychology courses can be paralleled by a wider 
bewilderment about what should constitute a proper knowledge base for a teaching force that 
aspires to be viewed as a profession. 


Geoffrey Brown's essay, which concludes the first section, discusses difficulties that arise 
in securing effective co-operation between teachers and university tutors in developing an 
innovative PGCE course: his account neatly instantiates problems of the kind identified by 
Norwich, and — more loosely — the issues raised by the first two essays. 


The three essays forming the second section, by Noel Entwistle, by Guy Claxton and by 
Peter Tomlinson and Roy Smith, look at ways in which students can be helped to acquire the 
skills and qualities required for teaching. Entwistle's argument is that it is possible to use both 
research into student learning in higher education and the student's ability to reflect on his 
own HE experience in order to sensitise the student to problems that he will encounter as a 
schoolteacher. The essay is complex without being arcane: a teacher educator could well draw 
on its implications for, say, PGCE course design. Claxton takes classroom competence as the 
focus for ITE courses, unpacks the notion, clarifies the relationships between teaching and 
learning, and shows how psychology might inform the process of becoming competent. A 
useful aspect (one often undervalued in ITE literature) is his analysis of how psychology offers 
a variety of techniques for helping students to cope with the stress and discomfort of teaching. 
The Tomlinson and Smith paper provides a concise but effective treatment of skill and skill 
training, with a persuasive report of an experiment using radio-assisted school practice. 


Each of the concluding four papers deals with a challenge. Charles Desforges examines 
the evident lack of impact made by psychology on the problem of helping teachers to manage 
learning in the primary classroom. His commentary on the poor focus of many courses, on the 
limitations of available psychological evidence, and on the directions in which courses should 
be modified is difficult to summarise, but full of signposts that are worth following-up. Peter 
Evans in his piece on special needs covers fairly familiar territory, but makes pertinent, if 
often unrecognised points about differences of approaches required for different areas of the 
curriculum, and about behaviour problems. David Fontana contributes a timely piece on 
personal development, a piece which, like several others, draws on psychological writings that 
would be ignored if a narrow, instrumentalism were seen to be the only contribution of 
psychology to ITE. This problem of an unsatisfactory narrow view of psychology is one 
starting point for the final essay, a treatment by Hazel Francis of the possible ways in which 
psychological understanding can help in curriculum planning. 


The variety and fertility of the approaches presented in Learning to Teach suggest that, 
while there can be for psychology no return to the certainties and optimism of Victorian times, 
the field deserves more imaginative treatment than many courses allow it. 

JiM GARBETT 
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GoTTFREDSON, G. D., and GorrrnEpsoN, D. C. (1985). Victimisation in Schools. New 
York: Plenum, $29-50. 


The book seeks to inform policy makers in the mid-1980s by examining questionnaire 
data from principals, teachers and students collected by the ‘‘Research Tnangle Institute as 
part of the NIE (1978) Safe School Study'' and ‘‘data about the communities in which the 
schools were located, prepared from 1970 census counts by Dualabs"' (p. 21). The analyses, 
the authors inform us are, ‘‘school-level ecological analyses" and thus ‘‘They do not 
necessarily apply to individuals (. . .), and readers are cautioned against interpreting in terms 
of individual characteristics or individual-level phenomena" (p. 26). Thus, the results of the 
analyses, apparently, are doubtful concerning individuals and out of date concerning schools 
and communities. Nevertheless, we are presented with a list of researchers who **have adduced 
incontrovertible evidence that delinquency rates vary in regular ways across social areas” (p. 
62). These include strong correlations between ''robbery levels and percentage black", 
negative correlations with the percentage of high school graduates, ‘“регсепіаре of families at 
75 per cent of the poverty level or below (0-46) and percentage female-headed families 
(0-53). The underlying theory of these kinds of analyses is the ‘‘social disorganisation'" 
perspective or a control theory of deviance. Somehow disorganised areas (where presumably 
control has broken down) are distinct from other organised (middle-class, wealthy?) areas. 
Community as defined statistically draws a convenient line between people who are treated 
very differently by the economic and social structures of society. 


Since all community data are derived from categories developed by statisticians who are 
outsiders to the communities, the purposes, experiences and ways of analysing social 
experience by the members are disregarded in favour of whatever purposes the data collectors 
and analysers had in mind. Hence the inherent social logic of the communities as defined by 
their members are not taken into account in the sampling procedures, the development of 
questionnaires and the definitions of social populations. The sources of various kinds of 
influences on schools are looked for within these artificially concocted communities instead of 
those social structures which allocate opportunities in terms of social and economic privilege 
and disadvantage. 


We are told ‘‘The reason we seek to measure disorder in the present research is in part due 
to our desire to assess a moral construct — to make statements about the effectiveness or 
harmfulness of the school socialisation properties, including governance, fairness, student- 
teacher interaction, and relevance of the curriculum on school disruption or delinquency” 
(pp. 119-20). They appear to do this without critically inquiring into the social construction of 
moral constructs (by say, social scientists, politicians, students, teachers, community 
members) and without observing life in schools and communities, or examination of school 
curricula. They are merely rated from a number of questionnaire items. Based on these, the 
critique offered is to present alternative models of factorising the data. They conclude that 
“better measures of school disruption are required”. Thus, ‘‘in planning future surveys, 
victimisation reports must be supplemented by self-reported delinquent and disruptive 
behaviour, and wherever possible by other objective evidence about the amount of 
victimisation and disorder occurring in the schools" (p. 166). 


The final policy recommendations are fairly limited. For example, there should be smaller 
schools with greater teacher contact **with a more limited number of students" — but this 
does not mean smaller classes (p. 172). The upshot is to provide problem students with a stake 
in conformity (pp. 171, 192). However, nowhere is there provided what is most needed — an 
educational critique of schooling and its relation to society. The book, like many of its kind, 
offers no vision of possible educational futures providing practical courses of action for both 
the economically and socially disadvantaged and policy makers. 

JoHN F. SCHOSTAK 


Quickg, J. (1985). Disability in Modern Children's Fiction. Beckenham: Croom 
Helm, pp. 176, #12-95. 


This book, written by a social psychologist, ‘‘presents a case for the inclusion of a 
planned element in the mainstream curriculum, specifically designed to encourage positive 
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attitudes and actions towards children with special needs, and for utilising the possibilities 
inherent in fiction for helping all children to explore their thoughts and feelings in this area’’. 
An avowed supporter of integrationist policies for children defined under the 1981 Education 
Act as having special educational needs Quicke has written a book which consists in the main 
of reviews of fiction which relates to disability and which might be used by teachers. The 
reviews are critical in line with his belief that in the past many reviewers have tended to 
abandon critical rigour when considering such material because, he claims, it is too often 
assumed that any book where a disabled person is a central character ‘‘serves an important 
social goal". His reviews make considerable use of ideas and findings from the psychological 
literature. 


Most of the books reviewed have important characters who have obvious disabilities of a 
physical or mental nature although some have child characters with mild emotional disorder 
or a mild or moderate learning difficulty. Throughout Quicke focuses on the treatment of the 
relationships within which the disability is contextualised not on the author's treatment of the 
disability itself. This influences the organisation of the book. Rather than deal with specific 
types of disability in each chapter a number of works are reviewed which treat a particular 
issue. In one chapter the issue is the realities of family life where one child is severely disabled; 
a second treats ‘‘The struggle to be myself’. Other chapters are concerned with the treatment 
in novels given to friendships between the disabled and non-disabled; forms of reaction 
amongst "normal" children to their disabled peers; the ''diagnosis' and “treatment” of 
emotional and learning difficulties; love; children and disabled adults. There is also a chapter 
on literature for young children. 


In each chapter a number of works of fiction are outlined briefly and the treatment of 
relationships reviewed critically. The criticism is well supported by references to relevant 
psychological literature and each chapter has a useful summary which in effect highlights the 
bases of the author's critique. For example: 


“The author's ambition appears to be to write a horror story rather than to 
genuinely explore attitudes towards disability. Similarly, in Unleaving the author is so 
intent on not pulling any punches that she goes further than necessary in her portrait of 
the severity and hopelessness of the disability. 


“The compassionate books in this collection — like Welcome Home, Jellybean — 
manage to combine a ‘tell it the way it is’ approach with a subtlety and humour in the 
writing appropriate to the sensitivities and experience of the age group. When reading 
these books we really do feel that the author has not refrained from writing about 
difficult topics, e.g., self injury, the mother's emotional state, the complexity of the 
sibling's feelings, but in a way that leaves us with a sense of the meaningfulness of the 
human life described and a feeling that the characters have survived, despite having had 
to cope with enormous problems. There has been a struggle for life which the characters 
are beginning to win." 

Any attempt to analyse, categorise and review in the field of children's fiction brings its 
own difficulties as the author acknowledges. In this instance the reader has difficulty in 
holding in mind the plots outlined, critical commentary, author's speculation and empirical 
evidence when working through a chapter. The author obviously has an extensive knowledge 
of the children's fiction he is dealing with and his reader is likely to be lacking in psychological 
background, acquaintance with children's literature or both, which means that those for 
whom the book will be useful will have to work quite hard to acquire the knowledge and 
insights which they seek. But then one does not want teachers to select fiction to use with 
children without reflecting in some depth and with care as to what they wish to achieve and its 
suitability for their purposes. 


How valuable is fiction in which main characters are disabled in improving the 
understanding and attitudes of children to those who are themselves disabled? How might 
such fiction be profitably used by teachers? The author acknowledges what he terms ‘‘a 
problematic area concerning the interaction between literature and social psychology’ which 
is not explored in this book. He makes the assumption that such fiction is likely to effect 
change and that the study of ‘‘good”’ fiction may well provide children with valuable insights. 
In a final chapter асе provides guidelines for choosing appropriate literature together with 
brief suggestions about the pedagogy which might be employed and suggests ways in which 
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teachers might monitor the effects of introducing and using the literature in schools. For those 
willing to undertake this important task the book is a very useful guide to the novels and a 
worthwhile introduction to the psychological literature with which they might work. 


Roy WHITTAKER 


SELTMAN, M., and SgLTMAN, P. (1985). Piaget’s Logic: A Critique of Genetic 
Epistemology. London: Allen and Unwin, pp. 353, £30.00. 


SHULMAN, V. L., RESTAINO-BAUMANN, L. C. R., and Butter, L.(Eds.) (1985). The 
5 of Piagetian Theory: the Neo-Piagetians. New York: Plenum, рр. 222, 
29.50. 


Here are two very different responses of the Piagetian tradition. Piaget’s Logic is an 
outright attack on all the major concepts and a total rejection of the theory. The authors see 
Piaget as a tragic figure, attempting, as they put it, to liberate the human mind by its careful 
study, but, instead, falling victim to pseudo-scientific explanations. On the other hand, The 
Future of Piagetian Theory reports the work of Genevan researchers who see themselves as 
extending Piagetianism by exploring issues neglected by the classical tradition. They see these 
extensions as complementing Piaget’s emphasis upon the logical nature of the thinking of the 
epistemic subject. 


When Piaget penetrated British and American psychology teaching in the late 1950s and 
early 1960s, he was, indeed, greeted by some of us who were then being initiated into the 
subject as a radically liberating influence. At last there was an alternative to the numerous and 
unlearnable postulates and corollaries of Hullian learning theory. At last, here were findings 
obtained from human beings, and, most important of all, a theory which related to the higher 
cognitive functions of human beings. Maybe the Seltmans are too young to remember this 
small rush of excitement. For them the significance of Piaget is that he represents, as a 
symptom, the failure of a philosophy which denies creativity and openness to experience as the 
source of new knowledge. Here is a sample of their style of attack: 


“The historical-evolutionary concept (Piaget’s), very much the product of the 
nineteenth century, is wholly linear and logico-mathematically conceived. It is more to do 
with predictable sequence than creativity. On the other hand, becoming-in-the-context-of 
being, and vice-versa (Hegel’s conception of it), expresses co-existent aspects of 
Actuality, that is, the coincidence and co-terminousness of being-and-becoming’’ (p. 6). 


I have perhaps chosen the worst case of over-blown prose in the book, but it is, 
unfortunatly, symptomatic of general faults. It is grossly unjust to categorise Piaget’s 
approach as a remainder of positivism: his views on explanation are too complex to be thus 
subsumed. Important philosophical issues are inherent in the reference to Hegel, but they are 
alluded to and not dealt with squarely; a few paragraphs on, phenomenology and 
existentialism which do stress becoming in the sense that the authors appear to intend, are, in 
fact, dismissed in two sentences. All in all, there is little evidence of considered argument in 
the portrayal of various theoretical alternatives. 


The book disappoints because there are important questions to be directed towards the 
status of many Piagetian concepts. There has long been some unease, for example, about the 
notion of equilibration. As used by Piaget, this concept conceals a fatal contradiction: it is the 
psychological explanation for the closed nature of logical operations and at the same time is 
said to represent **open systems in a steady state”. This kind of conflation vitiates the whole 
of Piagetian theory because the openness of thinking is implicitly and falsely subsumed under 
the categories of logic. In articulating this critique Piaget's Logic fulfils a useful function but 
it tends to be so dogmatic in style and the authors show no evidence of being able to consider 
the defences that could well be mounted from a Piagetian or neo-Piagetian perspective. 


On the other hand, the authors who contribute to The Future of Piagetian Theory 
broadly accept the theoretical framework but attempt to enlarge it through the study of 
variables originally neglected — social and linguistic variables and individual differences, 
chiefly. There are particularly valuable chapters by Doise, who has pursued the thesis that 
social interaction exerts a causal influence on cognitive development, and by Chipman, who 


236 Book Reviews 


explores the interaction of cognitive and linguistic competence in the development of 
comprehension strategies. Among others, there are papers on the infant visual system, on the 
self-image, and on the implications of the Piagetian approach for teaching. 


As the editors point out, if Piaget were alive today, he would no doubt wish to call 
himself a neo-Piagetian in the sense intended by the contributors to this volume, because he, 
contrary to the Seltmans, saw his own work as an open system that could be successively 
broadened. The Future of Piagetian Theory succeeds, I think, in presenting Piaget in this 
light. 

NEIL BOLTON 


WILKINSON, L. C., and Marretr, C. (Eds.) (1985). Gender Influences in Classroom 
Interaction. London: Academic Press, p. 288, £33-00. 


The papers in this collection are important, both as individual research reports and taken 
together. The data on sex inequalities in classroom interaction, and their effects on 
achievement in maths and science, are so serious that every parent, every teacher, and every 
educational researcher should be aware of them. There are, however, several shortcomings in 
the volume, and it would be far more impressive if the authors and editors had been less 
insular, both geographically and academically. The collection is narrowly American, and 
psychological, which led them to neglect relevant studies done in other countries and by 
researchers from other disciplines which would support their arguments about sex equality. 
Their insularity has weakened the force of the researchers’ case for public concern and further 
research on inequalities in classroom interaction. 


The papers were originally presented at a conference held in 1983 at the University of 
Wisconsin — Madison. All the contributors are American (or work in American institutions), 
and ail the references in all the papers are American except for one mention of Piaget. Most of 
the papers also neglect, or ignore, research done by classroom researchers from other 
disciplines such as educational anthropology or sociology. These two areas of literature 
(non-American and non-psychological) contain several studies of sex differences in classroom 
interaction, peer group attitudes to learning, and teacher expectations which would augment 
what the editors describe as a relatively small set of studies. The editors refer to the large 
literature on sex differences, and the substantial one on classroom interaction, but argue that 
studies which ‘‘compare the classroom interaction of boys and girls” are scarce. They then 
claim that they have drawn together in the volume what is known. Unfortunately, they have 
actually drawn together only the American psychological research. 


The insularity apart, the collection does have strengths, especially the coherence of the 
volume. All the authors are members of the *'invisible college for research on teaching", an 
informal group of scholars who study teaching processes using classroom observation 
techniques. They therefore know each other’s work, cross-refer to each other, and this means 
that the papers are mutually reinforcing. The collection is, therefore, a thorough overview and 
summary of the psychological, American research on the topic, and deserves to be in every 
education library in the country as a reference work for staff and students. It is not, however, 
an easy read. All the papers require close attention, as they contain data and are densely 
written. 


The chapter by Jere Brophy, a leading figure in classroom research, research on teaching, 
and studies of the teacher expectancy effects, is the keystone of the collection. His work in the 
last 15 years underlies many of the other investigations, because he was one of the first 
researcher to show how teachers’ attitudes were expressed in their interaction patterns with 
their children. Brophy reviews all the American psychology literature paying particular 
attention to the claims that young boys are handicapped in reading by the ‘‘feminine’’ 
atmosphere of the elementary classroom, and that older girls are disadvantaged in secondary 
maths and science by the regime in the high school classroom. Brophy's conclusions, on the 
basis of the review, is that there are significant sex differences in pupils' experiences of 
classrooms, which do correlate with learning outcomes. He is convinced by the evidence that 
the sex of the teacher does not have any noticeable influence on pupil learning because men 
and women teachers share similar ideas about ‘‘what males and females should be like”. 
These ideas about masculine and feminine roles, and the expectations for pupils teachers hold, 
do produce differences in classroom interaction. Brophy concludes that: 
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teachers of both sexes project the same differential expectations toward boys and girls 
and show the same tendencies to respond differently to gender-related student behaviour 


(p. 139). 


If Brophy is right, and the evidence he has marshalled seems convincing, then in-service 
education needs to be directed to male and female teachers equally. 


Some of the evidence summarised by Brophy is presented in the other 10 chapters in the 
book. Five of them focus on pupil-pupil interaction and attitudes from pre-school to junior 
high. The remaining five are based on classroom observations of teacher-pupil interactions, 
mostly using pre-coded schedules. 


Greta Morine-Derschimer’s paper is based on interviews with pupils to discover what they 
thought were the “‘important interactive characteristics of their peers". The sample (73 
elementary school pupils in a white suburb of New York) were asked to tell the researcher 
about the ‘‘other kids’’ in their class. Boys and girls differed in the importance they attached 
to various characteristics in their classmates. Girls were more likely to mention friendliness 
and classroom behaviour, boys to list talents, interests and academic ability. There were also 
differences in peer group values in the classrooms of the various teachers. Using both these 
variables, Morine-Derschimer concludes that: 


a productive learning environment for females may be unproductive for males and vice 
versa (p. 259). 


Noreen Webb and Cathy Kenderski looked at a similar topic, but among teenagers by 
tape-recording their small-group interaction in maths classes. They found that boys were more 
successful in getting help with their maths from peers than girls were, because both boys and 
girls answered questions from males, whereas girls’ questions are ignored by boys. This 
finding is worth further investigation, as it has implications for all co-operative learning 
situations. The research by Wilkinson et al. involved similar data collection on elementary 
school children, and found a similar pattern of males dominating the small-group interaction 
while doing maths. Both these studies would have benefited from reading parallel British 
work, such as that by Sarah Tann (1981) which bear on the same important issue: sex equity in 
the classroom. 


One of the most interesting papers on teacher-pupil interaction is by Linda Morse and 
Herbert Handley on young adolescents in science classes. They found that teachers interacted 
more with the boys, asking them more questions and giving them more reinforcement and 
feedback. The girls’ participation was mainly answering pre-prepared, factual questions: 
intellectually stretching interactions only happened between the staff and their boy pupils. The 
authors feel this goes some way towards explaining the lack of females in science courses and 
occupations in the USA. They fail to mention any of the relevant British work on science 
classes. 


As an ethnographer I found the paper by Linda Grant on 6-year-olds examining both sex 
and race as variables particularly interesting. While the overall pattern of teacher-pupil 
contacts was exactly what the reader would predict (white girls have most interaction with 
teachers, then black girls, then white boys, and then black boys) Grant found that there were a 
few white girls who were notoriously bad, and a few black boys who were particularly good. 
Analysing why some pupils develop quite different patterns from the majority of the children 
of the same sex and race is a major task for researchers who want to break the cycle of low 
achievement and bad behaviour in the black male. 


This book contains nine reports of original research on the topic of gender differences 
and classroom interaction processes. Each one is worth reading, and together they throw light 
on how boys and girls end up experiencing schooling rather differently. If only the authors 
had looked across the world for corroborative studies and linked their work to that of other 
scholars the volume would have been more useful still. 

SARA DELAMONT 
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THE EXPERIMENTAL USE OF PERSONAL CONSTRUCTS IN - 
EDUCATIONAL RESEARCH: THE CRITICAL TRIAD 
PROCEDURE 


By KEITH POSTLETHWAITE 
(Department of Educational Studies, University of Oxford) 


AND JOS JASPARS* 
(Department of Experimental Psychology, University of Oxford) 


Summary. The paper discusses a modification of the triadic elicitation procedure which is 
commonly used to reveal subjects’ personal constructs. The modification enables 
personal construct methodology to be used in the experimental, rather than descriptive or 
diagnostic exploration of research problems. The paper discusses one application of this 
modified procedure which relates to teachers’ assessment of pupils’ potential in physics. 
On the basis of the new elicitation procedure hypotheses were formed about the 
relationship between teachers’ ratings of pupils on individual constructs, and teachers’ 
overall assessments of pupils’ potential. These hypotheses were tested using repertory grid 
data and results consistent with favourite and disqualifying cue models of person 
perception were obtained. 


INTRODUCTION 


Ketty’s (1955) personal construct theory, and its associated methodological 
techniques, have often been used in studies which are essentially descriptive in 
character. For example, Nash (1973) used a personal construct approach to study 
the way in which primary school teachers characterised their pupils, Fairbairns 
(1978) considered the constructs used by women in connection with work, Norfolk 
(1979) researched the constructs used by marriage guidance counsellors in describing 
their clients. Other descriptive studies have focused on sub-groups who are in some 
way handicapped or at risk. For example, Gordon (1977) studied the constructs by 
which deaf adolescents described other people, and Norris (1977) explored the ideas 
of self that were held by offenders sentenced to detention centres. If this focus on 
“at risk” groups is further narrowed to ‘‘at risk” individuals, the personal construct 
approach becomes an aid to diagnosis in a clinical setting. Kelly himself wrote much 
about this use of personal construct theory in the context of psychotherapy (Kelly, 
1955, especially Volume 2). More recently Bannister and Bott (1973) have described 
this application at some length. They provide an account of a series of case studies 
during which they used personal construct methods to help clients to reveal their 
own understanding of their individual problems. 


This paper explores a modification of personal construct methodology — in 
particular, a modification of the common triadic elicitation procedure. (For a 
description of the usual procedure see, for example, Fransella and Bannister, 1977.) 
We argue that this modification allows personal construct methodology to be used 
as a tool for the experimental exploration of an issue, thus extending its application 
beyond the important fields of description and diagnosis which have been discussed 
above. 

The context in which our modification of the triadic procedure was developed 
was that of a study of teacher-based identification of pupils with high potential in 
*Jos Jaspars’ tragic and untimely death occurred shortly after this paper had been submitted to the 

Journal. The paper is substantially unchanged from the version on which he worked. It 15 therefore right 


that he should be regarded as joint author, though responsibility for remaining errors is clearly mine. 
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physics. Full details of this study have been reported elsewhere (Postlethwaite, 
1984). In one section of the research we wished to understand the part which indi- 
vidual constructs, held by a teacher, might play in colouring that teacher’s overall 
judgment of pupils’ potential. We therefore collected test-based evidence of pupils’ 
potential in physics, and teachers’ overall judgments of whether pupils were of high 
potential (i.e. in the ‘‘top 10 per cent’’ relative to their peers in their school in terms 
of potential for future performance at GCE O-level) or not. We then made up triads 
of pupils such that there was a known combination of test-based measurement of 
potential, and of teacher-based judgment of potential, for the pupils within each 
triad. On the basis of these characteristics of the pupils in each triad we were able to 
formulate hypotheses about the function that the corresponding constructs would 
play in relation to teachers’ overall judgments of pupils’ potential. These hypotheses 
were then tested using data from the full repertory grids completed by the teachers. 
Results consistent with ‘‘favourite cue and disqualifying cue" models of person 
perception (Cook, 1979) were obtained and the nature of such cues in relation to this 
particular judgmental task were revealed. 


METHOD 

In this section we will discuss the way in which the idea of favourite and 
disqualifying cues can be assimilated into a personal construct framework. We will 
then indicate how triads can be constructed so that hypotheses can be set up 
regarding the role which poles of the resulting constructs might have as favourite or 
disqualifying cues. We will then indicate how these hypotheses can be tested using 
repertory grid data. 

The relationship between the concepts of favourite and disqualifying cues, and 
constructs is essentially a simple one, but we will discuss it here as it allows us to 
introduce some terminology that will be helpful elsewhere in the argument. We 
begin by discussing the disqualifying cue. 

We suggest that the idea of a disqualifying cue may be helpful in trying to 
understand why teachers overlook some pupils who, according to the tests, have 
high potential for the subject. We label such pupils as (TP) (nominated by the fest 
(T) but not by the teacher — i.e. not of high perceived potential (P)). The idea that a 
teacher is influenced by a disqualifying cue in forming an overall judgment about 
such pupils suggests that s/he may use a construct, one pole of which is so firmly a 
negative correlate of that teacher's concept of high potential, that pupils placed 
at this pole will tend to be overlooked (labelled as (P)) irrespective of their 
position in the rest of the teacher's construct space. 


Similarly, the idea of a favourite cue may be helpful in trying to understand 
why teachers choose some pupils who, according to the test, do not have high 
potential for the subject. We label such pupils (TP) (not nominated by the 
test (T) but nominated by the teacher — i.e., of high perceived potential (P)). The 
idea that a teacher may be influenced by a favourite cue in forming judgments about 
these pupils suggest that s/he may make use of a construct of which one pole is so 
closely associated with that teacher's view of high potential in the subject, that 
pupils placed at this pole will tend to be chosen (labelled as (P)) irrespective of their 
position in the rest of the teacher's construct space. 

There is no guarantee that disqualifying and favourite cues will lie at opposite 
poles of the same construct (e.g. a teacher's disqualifying cue may be '*uncouth 
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behaviour’’ but his/her favourite cue may Бе ‘‘good at arithmetic") so care must be 
taken to ensure that both kinds of construct can emerge from a personal construct 
interview. 


It was this consideration that suggested the modification of the triadic 
elicitation procedure. Of course, if the overriding consideration governing the 
elicitation procedure is that it should provide a representative map of the whole 
range of constructs used by the individual in relation to a particular kind of element 
(as is likely to be the case in a descriptive or diagnostic study), then the triads we 
present to the individual should be representative of all possible triads that could be 
constructed from the relevant list of elements; and this, in turn, implies that the 
most suitable way of choosing the three elements for each triad is on the basis of 
random selection. This is clearly stated by Yorke (1978) who advises that ''. . . 
without any strong justification for choosing to use particular triads it is probably 
preferable to assign elements randomly”. 


However, the same considerations do not apply if one's main aim is to explore 
some specific part of a person's construct space — e.g., the part concerned with 
favourite and disqualifying cues related to the judgment of potential. The overriding 
consideration in such a study is that of encouraging the teacher to mention the 
constructs that were associated with any favourite and disqualifying cues that s/he 
might use. Clearly the appropriate constructs would only emerge if a (TP) pupil (one 
who was overlooked despite high test score) or a (TP) pupil (one who was chosen 
despite low test score) was presented, in a triad, with appropriate contrasting pupils. 


An earlier part of our research had demonstrated that there were relatively few 
pupils who could be labelled as (TP) or (TP) i in any one subject in any one school 
(Postlethwaite, 1984). Therefore, only if large numbers of triads were presented to 
each teacher, could one rely upon the random selection of pupils for each triad to 
generate the appropriate groupings. This recourse to large numbers of triads was 
itself rather unattractive, partly because it would have been unacceptably 
demanding on the time of the teachers, but more seriously because the presentation 
of large numbers of triads would have been exhausting for both parties involved in 
the elicitation interview (Pope and Keen, 1981). As a result, the long interviews 
might be expected to become more and more unrewarding as they continued. We 
were therefore drawn back to a position not unlike that adopted by Kelly himself, in 
his seminal work on Personal Construct Theory (Kelly, 1955, p. 273 et seq). He 
suggested using, and gave examples of, carefully chosen triads which could be 
offered to elicit constructs of a particular kind. (This idea has also been suggested by 
Easterby-Smith (1981) though he provides nothing more than the hint that it might 
be useful to present triads which will ‘‘bring out the greatest contrast in the elements 
available’’.) Of course, as the kinds of constructs in which we were interested 
differed from those of which Kelly was writing, the details of the formulae for the 
construction of these special, or ‘‘critical’’ triads were necessarily different in our 
case. Nevertheless the principle is the same. 


As an example of the kind of critical triad that was relevant to our work, 
consider the following group of pupils (TP; TP; TP). These three pupils had similar 
high scores on the test measure of potential and so were all classified as (T). One also 
had high perceived potential (P), but two were not regarded as members of the top 
10 per cent group by their teacher (P). If, in the elicitation interview, the teacher 
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grouped the pupils as (TP; TP) and (TP), one might expect that the resulting 
construct would relate to the way in which that teacher made errors of underesti- 
mation. Clearly other distinctions between the pupils in the triad might override this 
distinction based on test and teacher perceptions of potential and one would not, 
therefore, expect all triads of this type to be split by the teacher in the way which was 
suggested above. Nevertheless, when such a split was made, one can say that the pole 
which described the two TP pupils was a characteristic of these two pupils who had 
been underestimated by the teacher in comparison with the test. This characteristic 
may have acted to disqualify these pupils from membership of the teacher’s more 
able group. One can speculate that it might have had more general application as a 
disqualifying cue. The first level of interpretation (that the cue was associated with 
disqualification for the individual pupils who were in the triad) can be made directly 
from the nature of the triad. The second (that it had general application as a 
disqualifying cue) requires further study. It is essentially a hypothesis that needs 
testing. The teacher’s ratings of pupils on the construct can be used as the data for 
such a test. 


To summarise: not only does the presentation of carefully designed ‘‘critical’’ 
triads encourage the teacher to mention the particular kinds of construct in which we 
were especially interested, it also allows us to make an a priori classification of the 
resulting construct. The appropriate constructs might emerge if enough randomly 
designed triads were presented but no a priori classification would then be possible. 
Because the ''critical" triad approach gives rise to this possibility of forming 
hypotheses that can later be tested, it effectively shifts the personal construct 
technique from one of description and diagnosis to one of experimental exploration 
of the topic. We believe that this represents a novel way of putting personal 
construct methodology to work. 


Having established the idea of ‘‘critical triads’’, it is now appropriate to turn to 
the question of how many different kinds of critical triad can be designed which are 
relevant to a study of the relationship between test and teacher-based estimation of 
potential. To do so we can regard our problem as one of two-way classification 
(pupils are labelled as ‘‘of high potential" or not by the teacher, and as ‘‘of high 
potential" or not by the test). There are therefore four different types of pupil (TP), 
(TP), (TP) and (TP). We can clearly make up four types of triad in which all three 
pupils are the same, and four types in which all three pupils are different from each 
other (e.g., TP, TP, TP). In each of these cases a priori classification of the 
constructs which emerged from the triads would be impossible. However constructs 
emerging from the other triads can be classified on an a priori basis. These triads are 
listed in Table 2, together with the appropriate classifications. To see that these are 
indeed the only types of triad which can in principle produce relevant information, it 


TABLE 1 
TYPES OF CRITICAL TRIAD 


————— MM —————————————— 
Contrast Pole 


Construct pole TP TP TP 








1 
IP TP 1 
TP TP | Р D B 
TP P, €2 Di E2 
Р TP | AD в Е 





Triad type Al: 
Triad type A2: 
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TABLE 2 
DESCRIPTION OF CRITICAL TRIADS 


TP, TP, TP 
TP, TP, TP 


These triads give rise to constructs on which the teacher judged the triad pupils in the 
same way as the test. They may be constructs that are used as valid indicators of high 
potential when judging pupils generally. 





Triad type Ві: 
Triad type B2: 


TP, TP, TP 
TP, TP, TP 
These triads give rise to constructs on which the teacher separated pupils with similar low 


test score. The pole at which TP was placed describes overestimated individuals. It may 
have served generally as a favourite cue in the teacher's judgment. 





Triad type Cl: 
Triad type C2: 


TP, TP, TP 
TP, TP, TP 
These triads gave rise to constructs on which the teacher separated pupils of similar high 


test score. The pole at which TP was placed describes underestimated pupils. It may have 
served generally as a disqualifying cue for the teacher. 








Triad type D1; 
Triad type D2: 


Triad type El: 
Triad type E2: 


TP, TP, TP 

TP, TP, TP 

These triads give rise to constructs on which the teachers’ perceptions lead to reverse 
assessments from those based on the test. Favourite and disqualifying cues may be found 
at the opposite poles of these constructs. 

TP, TP, TP 

TP, TP, TP 

In these triads T and T are distinguished by the teacher at the time of the elicitation 
interview but not when the original identification as not gifted was made. The teacher 


may originally have undervalued the characteristic at the TP pole as an indicator of high 
potential — or this characteristic may have served as a disqualifying cue. 





Triad type F1: 
Triad type F2: 


TP, TP, TP 


TP, TP, TP 


In these triads T and T are distinguished by the teacher at the time of the elicitation 
interview but not when the original identification as gifted was made. The teacher may 
originally have undervalued the characteristic at the TP pole as an indicator of low 
potential — or this characteristic may have served as a favourite cue. 





Throughout the table, the construct classification only applies if the triad is split so that the third member 
is identified as the ‘‘odd one ош”. 


should be realised that the TP pairs which form the construct pole should be the 
same, and different from the contrast TP pole. Table 1 shows in a glance how the 
critical triads emerge from these considerations. 

The six types of triad presented in Tables 1 and 2 can be further subdivided. 
Triads of Type A can be expected to yield constructs on which teachers make 
judgments which correlate with the distinctions made by the test. Triads of Types B, 
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C and D can be expected to yield constructs, individual poles of which have diag- 
nostic value with respect to the judgment process of the teachers. For example, the 
TP poles from Type B triads may be operating as favourite cues for teachers and 
may give clues to some of the causes of errors of overestimation which they might 
make. Constructs from Type E and F triads could possibly have diagnostic value by 
bringing favourite or disqualifying cues to light, but they may also be diagnostic ina 
different sense for they may reveal pupil characteristics which teachers notice, but 
are failing to interpret as indicative of genuine differences in ability. Distinctions 
between the constructs from Type E and F triads which have these two different 
kinds of diagnostic value can be made by inspecting the actual characteristics which 
make up each construct, and by analysing in more detail the relationship between 
the ratings which teachers give to pupils on the construct and the teachers’ overall 
assessments of those pupils’ potential. Some aspects of this further analysis will be 
discussed later in this paper. 


It must be recognised that an elicitation procedure based only on these critical 
triads may produce a distorted grid, limited by our own perception of the nature of 
the problem (Yorke, 1978). Our major response to this criticism is that we have 
traded a representative survey of teachers’ constructs for an experimental study of 
constructs which behave as favourite or as disqualifying cues. However, to go some 
way to overcoming this criticism in a more direct manner, some triads made up of 
randomly chosen pupils were included in our elicitation interview together with 
triads fitting the specifications listed in Table 1. In presenting the triads, critical and 
randomly chosen triads were intermixed, and the overall order of presentation was 
chosen at random. 


Names of pupils to form the critical triads were selected in the following way. 
The pupils taught by a given teacher were each labelled as TP, TP, etc. When a pupil 
of a particular kind (e.g., TP) was needed in a triad, a name was chosen at random 
from the set of pupils of that kind who were taught by that teacher. When triads 
suiting all of the specifications in Table 1 had been constructed, they were listed in 
the order of presentation. They were then inspected and some changes were made, if 
necessary, to avoid repetition of pairs of names in successive triads, because Bender 
(1974) found that there was a significantly greater tendency to produce important 
constructs when successive triads were varied by two elements at a time, rather than 
one at a time. 


The intention was that the elicitation interview should present each teacher with 
15 different triads. Some teachers, who felt that they had exhausted their repertoire 
of constructs before this point had been reached, actually handled fewer than 15 
triads. Others dealt with several more. The time available for the interviews 
(normally about 90 minutes) enabled this scale of activity to be conducted in a busy, 
but not unduly rushed fashion. It is interesting to note that all of the teachers who 
went beyond 15 triads felt that they had nothing extra to offer after dealing with at 
most five additional triads. It seems that it would indeed have been of little value to 
attempt to work with very large numbers of randomly chosen triads. 


RESULTS 
(a) Preliminary information 
In all, 182 triads were presented to 11 physics teachers in seven schools: 140 of 
these triads were critical triads: 42 were randomly chosen ones. In 19 cases (12 
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critical triads, and 7 randomly chosen ones) teachers found it impossible to divide 
the three pupils into similar pairs, and a contrasting odd-one-out. It is interesting to 
note that teachers were unable to split 8 per cent of the critical triads, whereas they 
failed to split 17 per cent of the random triads. This does suggest that the use of 
critical triads did, as predicted, add to the efficiency of the elicitation procedure. 


Of the 128 critical triads that were split by the teachers, 80 (62-5 per cent) were 
split as one might have expected from the design of the triad. When each type of 
triad identified in Tables 1 and 2 was analysed separately, the following results were 
obtained. 





TABLE 3 
CRITICAL TRIAD SPLITS ANALYSED BY TRIAD TYPE 
| ` NUMBER SPLIT SIGNIFICANCE 
TYPE | NUMBER SPLIT As EXPECTED (P) 
Ar! T 9 0-001 
A2 | 10 6 0-12 
ВІ | 7 5 0-04 
B2 | 12 8 0-02 
СІ | 11 9 0-001 
C2 | 12 6 0-18 
DI | 10 7 0-02 
D2 | 13 7 0-10 
EI | 12 8 0-02 
E2 | 7 4 0-17 
ЕІ | 18 8 0-20 
F2 | 4 3 0-11 





For an explanation of the significance calculations see the technical note at the end of this paper. 


We find that in six cases the results were significant at the 5 per cent level 
(0-01 < P « 0:05). In three further cases the probability that the observed number of 
‘expected splits’ would arise by chance was approximately 0:1, and in no case did it 
rise above 0-2. These results do provide some evidence that teachers were influenced 
by the design of the triads. More often than could be expected if they were splitting 
the triads at random, they chose, as the odd-ones-out, the pupils whose specification 
(in terms of T, T, P and P) would lead us to expect them to be chosen. Had this not 
been the case, there would have been little incentive to continue to refer to the 
implications of the triad design in discussing the results of the elicitation interviews. 
As it is, continued reference to the triad designs does seem to be appropriate. 


(b) Constructs elicited from physics teachers 

Limitations on space prevent discussion of all the results of this procedure. 
However we will present the full set of constructs elicited from Type B critical triads. 
These triads enabled us to explore dimensions on which teachers separated low- 
scoring pupils whom they did not regard as ‘‘of high potential" from similar low- 
scoring pupils who were identified as belonging to this category. 


For the set of constructs elicited from Type B triads we will first discuss the raw 
results, concentrating on the idea that, for the individual pupils in the triad, the 
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descriptions offered at the two poles were associated with known combinations of 
test-based and teacher-based assessments of potential. In Section (c) we will go on to 
a more general analysis of these same constructs. 


Use of Personal Constructs 


If a Type B triad was split in the expected way, the pupil characteristic at the TP 
pole can be said to have been associated with an individual pupil (or pair of pupils) 
for whom the teacher’s judgment of potential exceeded that based on the pupils’ test 
scores. As teacher-identified pupils with test scores roughly two standard errors 
below the top 10 per cent cut-off score were chosen as TP pupils, the chance of these 
pupils having actual potential high enough to place them in the top 10 per cent group 
was small. The test-based classification can therefore be assumed to be correct for 
most of these pupils, despite some error in the test itself. (The whole issue of test 
error — which is clearly of great importance in studies of this kind — is discussed in 
detail elsewhere (Postlethwaite, 1984)). It follows that the most likely explanation of 
the difference between teacher and test in such a case is that the teacher was in error. 








TABLE 4 
CONSTRUCTS FROM TRIADS CONTRASTING FALSE HITS AND CORRECT REJECTIONS 
ТҮРЕ B (B1 AND B2) TRIADS 

DEUX vota veri aa i. ER от ү ae Ss LIE Xe A er. 

і А CONSTRUCT me. | 
TEACHER | TP pole TP pole | Prob* % 
03 (a) i Absorbs day to day work Doesn't absorb day to day work | < 0°01 е 
03 (b) l Enthusiastic Not enthusiastic NS 75% 
| í(p-009 0 
04H | Confident in practical work Hesitant in practical work | «0:05 А “Oo 

| 

| 
04M(a) | Concentrates Unable to concentrate | < 0:01 2 
| 

04M(b) High overall ability in physics Low overall ability in physics | < 0-05 D 

| | 
05 | Good, conforming behaviour Non-conformist behaviour | <0-01 9) 
| | ) 
07B | Good test results Low test results — — 
07G | = = р! = = 
08 | High performance Low performance | < 0-001 Be 
| ) 

| 
09 | Applies him/herself 1n lessons No dnve in lessons {| «0-001 p 
) 

| | 
10 i Conscientious Very lazy | NS wo 
) 

| 
11 l Good mathematically Not good mathematically ; «0-01 n 
) 

| | 
12 | Doesn’t ask questions Asks questions f NS 42% 


* For explanations of these two columns, see the text. 
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The precise nature of the characteristics under the TP pole is therefore worthy of 
careful study, as this will give clues to factors which may have led the teachers to 
make errors of overestimation in the case of certain individual pupils. 


With this interpretation in mind, we turn now to the results in Table 4 which 
were achieved when these Type B triads were presented. 


An interesting feature of the descriptions of the over-estimated, TP pupils in 
this table is that there is a clear reference to what might be regarded as ‘‘model pupil 
behaviour’’ (See, for example, constructs 03(a), 03(b), 04H, 04M(a), 05, 09, 10). 
Behaviour of this kind may of course be associated with high potential in some 
cases, but the indication from this work is that, for some pupils, teachers may have 
interpreted it in this way even when the test score would suggest that this interpre- 
tation was unduly optimistic. 


It is interesting to speculate on what was happening where teachers labelled a 
pupil with that low test score as a pupil of ‘‘high potential’’ (04M(b), 08). It may be 
that the process of interpretation of cues was happening so automatically at a non- 
verbal level, that the teachers were not aware that what they labelled as ‘‘high 
ability" was actually observed as, for example ‘‘application’’ or ‘‘enthusiasm’’. 
Indeed, such confusion is sometimes to be expected. Teachers cannot observe ability 
directly, but only infer it from observed performance (often in only a limited sphere 
of activity). It is therefore likely to be difficult for them to distinguish between moti- 
vational and ability determinants of that performance and thus to misjudge the 
ability of pupils with unexpectedly high or low motivation. 


An alternative explanation is, of course, that in these particular triads the pupils 
so labelled by test and teacher were pupils whose test scores were in error. As we 
have pointed out, attempts were made, by choosing pupils with extreme test scores, 
to reduce the impact of this test error on the triadic elicitation procedure. However 
one must recognise that some individual pupils would have had a true potential 
which was considerably different from that indicated by the test score. 


(c) Testing the favourite cue hypothesis 

In the discussion so far, it is only for the pupils who were part of the triad from 
which a construct came that we have claimed that the characteristics that were 
elicited were associated with the known combination of teacher-based and test-based 
assessments of potential. For example, the individual pupil described by Teacher 05 
in Table 4 as having ‘‘Good, conforming behaviour” was over-estimated by that 
teacher. Such findings are worthwhile in their own right. However it would be far 
more interesting if the link between a given characteristic and ‘‘being selected”? 
_ could be shown to be a general one for a particular teacher. To continue the previous 
example, suppose a// pupils who were described as having ‘‘good conforming 
behaviour" were found also to have been selected by the teacher as ‘‘of high 
potential". This would indicate that the end result of that teacher's assessment 
procedure was consistent with the idea that good behaviour operated as a favourite 
cue for that teacher. 


This kind of information would be of theoretical interest as it would provide 
indirect evidence for the favourite cue model of person perception, and it would 
provide one example of the kind of favourite cue that might be operating in the 
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context of judgments of potential. By stressing that the relationship between good 
behaviour and teacher-based selection was generalisable, it would also be of 
practical value to teachers in helping them to refine their overall assessment 
procedure. 


The purpose of this section, then, is to explore this question of the 
generalisability of the relationship between the pupil characteristics elicited through 
the critical triad procedure, and the selection (or not) of pupils as ‘‘of high 
potential’’ by the teacher. In doing so we are effectively testing the hypotheses about 
the role of these characteristics in relation to teachers’ overall judgments of potential 
which we were able to formulate because of the design of the triads. 


We tested the view that pupil characteristics associated with the over-estimation 
of the potential of individual pupils by a teacher might be related to the teacher’s 
general assessments of potential in a fashion consistent with that characteristic 
playing the role of a favourite cue. This was done in the following way. A2 x 2 
contingency table was drawn up for the characteristic. This was based on teachers’ 
ratings of their pupils on the construct to which this characteristic related, and on 
their overall judgments of their pupils as ‘‘of high potential" or not. This can be 
clarified by quoting an example. Consider the first construct in Table 4. The charac- 
teristic ‘‘Absorbs day to day work” was a description of an over-estimated pupil. 
The ratings of pupils on these constructs, together with the teacher’s overall 
assessment of the pupils’ potential, were used to make up the following table: 





TABLE 5 
ASSOCIATION BETWEEN CONSTRUCT RATING AND OVERALL ASSESSMENT OF POTENTIAL 
CONSTRUCT 03a TRIAD TYPE B 
Construct ... ‘Absorbs day to day work” — **Doesn't Шей Tu to day work” 
(1) — 5 

Selected as | Number of pupils 

“more able"? d Rated at the (1) pole Not rated at the (1) pole 
Selected 5 3 
Not selected | 1 18 


We can see immediately that 5/6 of the pupils (83 per cent) who were described 
as absorbing day to day work were also selected as more able. A Fisher Exact 
Probability test, applied to this table, established a significant association (P « 0-01) 
between being selected as **more able” and being rated at the pole ‘‘Absorbs day to 
day work’’. 


Comparable statistics were calculated for all of the constructs in Table 4 and the 
results are shown in the two right hand columns of that table. 


When the Fisher test indicated a significant association, when the direction of 
this association was consistent with the hypothesis being tested and when more than 
75 per cent of the pupils having the characteristic were actually identified as more 
able, it was clear that a high proportion of all of the children described in this way by 
the teacher had also been identified as more able and that this association was 
unlikely to be the result of chance alone. We concluded that this characteristic did 
bear a generalised relationship to the teacher's identification strategy that was 
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consistent with the view that the characteristic had functioned as a favourite cue. In 
the absence of contradictory information we therefore felt justified in labelling the 
characteristic as a favourite cue. 


(d) Examples of results obtained 

Similar procedures were applied to all of the constructs elicited from triads of 
Type B and F and those from Type D (which could be expected to yield favourite 
cues at one pole). Table 6 gives a list of all of the pupil characteristics which emerged 
as a pole of one of these constructs and which did indeed appear to be associated 
with teacher-based assessment of potential in a way that was consistent with the 
favourite cue model. 


TABLE 6 
CHARACTERISTICS OPERATING AS FAVOURITE CUES 





riad 
Type Teacher i Characteristic 

B 03a | Absorbs day to day work 

D 03 l High self motivation 

B 04M(a) 1 Concentrates 

B 04M(b) High overall ability in physics 

B 08 | * High performance 

D 08 | * Committed to the subject 

D 09(a) | Made good progress 

B 11 | * Good mathematically 

D 11 | Work improving 

D 12(a) | Serious about work 

D 12(c) | " Content of written work good 
NOTE: 


(i) “Careful presentation" (Triad Type D, Teacher 04H(a) P = 0:001, % = 71%) and 
“Enthusiastic” (Triad Type B, Teacher 03(b), Р = 0-06, % = 75%) just failed to reach the levels which 
we set for accepting that a characteristic worked as a favourite cue. 

(ii) AH pupils having characteristics marked * were selected as more able by their teachers. 


From the critical triad procedure, we were able to discover favourite cues for six 
out of 11 teachers. Four of these cues (from four different teachers) were related to 
pupils’ motivation, attitudes or personality. The other two teachers had ‘‘making 
progress" as a favourite cue. The other five favourite cues in the table related to 
overall assessments of ability or to mathematical ability. 


It is, of course, possible that the relationships which are obtained here reflect in 
part a correlation between the personal constructs and the pupils’ ability as 
measured by the tests. This is not very likely, at least for Type B constructs, since the 
original triad on which each construct-contrast dimension is based was chosen in 
such a way that there was no difference in the pupils' ability as measured by the test. 
Hence no correlation with personal construct dimensions can exist as far as these 
three persons are concerned. If the remaining pupils have introduced such a 
correlation it means that at least some of the cues in Table 6 are genuine indicators 
of ability. The nature of several of the cues suggests that this may be the case. It 
follows that use of a pole of one of these constructs as a favourite cue may not, in 
general, be a poor strategy for the teacher — though one should bear in mind that 
even for these constructs the strategy did lead to error in the case of some pupils (the 
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pupils in the original triad). In theory we could have partialled out the effect of test 
ability but given the small numbers involved this did not seem to be a sensible 
procedure. 

The reader will recall that triads of Type C and D could be expected to yield 
constructs that contained pupil characteristics that operated as disqualifying cues 
(and that Type E constructs could also do so). We elicited such constructs and 
decided whether to accept a characteristic as a disqualifying cue by employing a 
procedure that was an exact parallel of that used for favourite cues (i.e., the charac- 
teristic must be significantly associated with non-selection as ‘‘more able’’ and 75 
per cent or more of the pupils having the characteristic must be overlooked). This 
led to a set of characteristics which did seem to fulfil roles as disqualifying cues. 
These are set out in Table 7. Again characteristics which met the stricter criterion 
that all pupils who had the characteristic were omitted from the teacher's list of 
more able pupils, are marked with an asterix. 


TABLE 7 
CHARACTERISTICS OPERATING AS DISQUALIFYING CUES 


Triad | 





Туре Teacher E Characteristic 
C 04H(a) | * Low quality content in written work 
D 08 | * Not committed to the subject 
C 09 | Questions have to be directed to him to involve him 
D 09(a) | * Fails to show early promise 
D 09(b) | * Low general standards, e.g., of politeness 
E 09(b) | Not involved in lessons 
С 11 | * Shows poor understanding of physics 
C 12(a) | * Finds concepts in physics difficult 
D 12(c) | * Content of written work weak 
NOTES: 


(i) “Reluctant to hand in homework” (Triad Type C, teacher 10 (b)), ‘‘Poor presentation of written 
work” (Triad Type D, Teacher 04H(a)) and ‘‘Slow to grasp ideas” (Triad Type D, Teacher 04H(b)) just 
failed to reach the levels which we set for accepting that a characteristic worked as a disqualifying cue. 

(ii) Several other characteristics were disqualifying cues in the sense that all pupils described in this 
way were not selected. However, in these cases the association between non-selection and being described 
as having these characteristics was not significant and, since the association may have arisen by chance, 1t 
is unwise to give too much weight to the disqualifying role of these cues. 

(iii) No pupils having characteristics marked * were selected as more able. 


From the critical triad procedure we found that five out of 11 teachers produced 
construct poles that operated as disqualifying cues. Three of these teachers had 
disqualifying cues that related to pupils’ motivation, attitude or personality; the 
other three had disqualifying cues related to pupils' ability in physics. 


Of course, pupil characteristics elicited from random triads could also have 
been operating as favourite or disqualifying cues. Both poles of the 63 random triads 
were therefore inspected according to the criteria used above. Ten out of 126 (8 per 
cent) characteristics were found to have operated as favourite cues. This should be 
compared with the fact that 11 of the 33 (33 per cent) characteristics elicited from 
relevant, correctly split critical triads were shown to have operated as favourite cues. 
Even when one takes into account that critical triads were not always split in the 
expected way (see Table 3) this still suggests that the presentation of appropriate 
critical triads was a more efficient way of eliciting favourite cues than the usual 
random triad procedure, followed by post hoc analysis. A similar argument leading 
to similar conclusions can be advanced in the case of disqualifying cues, though in 
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this case the advantage of presenting critical triads was, in purely numerical terms, 
smaller. ` 


DISCUSSION 


The critical triad elicitation procedure appears to be an efficient way of eliciting 
constructs of a particular kind from teachers. Critical triad methodology also avoids 
the difficulties of the kind of post hoc analysis that is involved in searching the 
random triad results for ''interesting" findings. We are reminded of Kerlinger’s 
comment (1973, p. 236): "Whenever hypotheses are formulated and systematically 
tested and empirical results support them, this is much more powerful evidence of 
empirical validity of the hypotheses than when ''interesting" (sometimes translate: 
“support my predilections’’) results are found after the data are obtained.” 


We therefore suggest that this experimental style of use of personal construct 
theory, based on the critical triad idea, may be of general application and hope that 
this paper might encourage others to apply it in other fields. 


TEcHuNiCAL Note. — In general each teacher handled one triad of each type, so each of the 
splits made within a particular type of triad could be regarded as independent of the other 
splits made within that type of triad. One can therefore calculate the probability of achieving 
exactly the observed number of ‘‘expected splits" within each triad type, on the basis of 
chance alone. A simple summing of such probabilities then enables one to calculate the 
probability of making this number, or more expected splits on the basis of chance alone. This 
figure can be interpreted as a significance level, indicating the significance of the deviation of 
the observed result from that which would have been obtained purely on the basis of random 
selection from the triads. 


To illustrate the calculation of these significance levels we shall take the case of the Al 
triads as an example. If the 11 triads of this type were split in a random fashion, the chances of 
making nine of these splits in the expected way would be: 


1C,.(1/3)*.(2/3) 


(This is based on the idea that the chance of any one way of splitting 11 triads so that nine were 
"expected" splits and two were **unexpected" splits would be: 


(1/3)*.(2/3)? 
and on the fact that "С, gives the total number of ways in which nine ‘‘expected’”’ splits could 
be made from 11 triads.) It follows that the significance level associated with nine ‘‘expected 
splits" (i.e., the chance of getting nine or more ‘‘expected splits” if the triad was split at 
random) can be calculated from the expression: 


11 
Significance level =) - rd 3y.Q/3y7» 
Te 
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CHARACTERISTICS 
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Summary. Nine-year-old children were presented with reading tasks in which psycho- 
linguistic properties of words were manipulated. Although word imageability affected 
reading accuracy, grammatical class of words and regularity of spelling-to-sound 
correspondence did not. Regularity had no effects on lexical decision and judgments of 
phonology. Accuracy in reading aloud and judgments of phonology was substantially 
lower for non-words than for words. Only the best readers had well-developed skills in 
grapheme-to-phoneme sound conversion, and all appeared to rely on direct visual access 
when reading. A sentence comprehension task devised by Doctor and Coltheart (1980) 
was used for investigating the role of phonological coding in reading for meaning. 
Phonological effects were evident since children found it difficult to reject meaningless 
sentences which sounded correct. It is argued that the phonological effects could arise 
through post-lexical phonological coding. The results support models of reading 
acquisition which suggest that the development of non-lexical grapheme-phoneme 
conversion skills lags behind the development of direct visual access. 


INTRODUCTION 


An account of normal reading development must describe how the sub-component 
systems and processes possessed by the skilled reader are acquired. This paper - 
reports experiments derived from cognitive psychology which aim to clarify this 
development. Information processing models have been fruitfully applied in 
attempts to understand skilled reading and the performance of acquired dyslexics 
(Patterson, 1981). Attempts at characterising reading acquisition in terms of these 
models are exemplified in the work of Barron (1980), Doctor and Coltheart (1980), 
Marsh ef al. (1981) and Seymour and his colleagues (Seymour and Porpodas, 1980; 
Seymour and MacGregor, 1984). The model proposed by Morton and Patterson 
(1980) has been influential and assumes that words can be read by direct visual 
access either via the cognitive system involving comprehension, or directly from 
word recognition to the pronunciation store which does not involve comprehension. 
The third route is non-lexical and is achieved by use of a grapheme-phoneme 
conversion system. Research on skilled reading suggests that direct visual access via 
the cognitive system is the habitual strategy used by adults and that the non-lexical 
route is used as a back-up procedure in the reading of unfamiliar words (Seymour 
and MacGregor, 1984). 


It has been argued that the development of certain procedures for lexical access 
precede the development of others. Doctor and Coltheart (1980) argued that in the 
early stages of reading lexical access is predominantly effected by the non-lexical 
phonological process, and Bradley and Bryant (1983) found a causal connection 
between early phonemic segmentation skills and subsequent progress in learning to 
read several years later. 


In contrast others such as Biemiller (1970), Marsh ef al. (1981), Seymour and 
Elder (1986), Francis (1984) and Frith (1985) have proposed that very young readers 
(some as young as 5 years) use direct visual access before grapheme-phoneme 
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conversion skills develop. The four-stage model proposed by Marsh er al. (1981) 
suggests that qualitative differences in reading performance should be found in 
children at different stages. Stages 1 and 2 are characterised by visual access, which 
becomes more precise, and an absence of grapheme-phoneme conversion skills; thus 
children at these stages should be unable to read non-words and should not 
regularise words irregular in spelling-to-sound correspondence. Regularisation 
errors (i.e., reading steak at steek) should only begin to occur at Stage 3 where 
grapheme-phoneme skills develop. Although there are degrees of regularity and 
word frequency interacts with the effects of regularity (Seidenberg ef a/., 1984) 
regularity effects in reading (i.e., poorer reading aloud of irregular words) in 
children have been reported by Barron (1980), Seymour and Porpodas (1980), Ellis 
(1984) and others. Thus three tasks were devised to examine performance on words 
with irregular spelling-to-sound correspondence and non-words. 


Frith (1985) suggests that the entries in the visual word recognition system of 
the early stage are more likely to be high imageable/concrete words and content 
words. Consequently, these should be easier to read for beginning readers than low 
imageable and function words. Effects of imagery on reading accuracy were 
reported by Jorm (1977) who found that poor readers aged 8 to 10 had difficulties in 
reading low imagery words, whereas competent young readers were unaffected by 
imagery. However, Baddeley ef al. (1982) found that both normal 9-year-olds and 
12-year-olds developmental dyslexics found low imagery words harder to read than 
high imagery words. Further tasks were therefore devised to assess the effects of 
imageability and grammatical class. 


Finally, since Doctor and Coltheart's (1980) conclusions were based on a 
sentence comprehension task rather than on reading words in isolation, this task was 
presented in Experiment 2. The subjects tested were 9 years old and included 
children having reading ages of approximately 7, 8 and 9 years. The words used in 
the experimental tasks were selected as far as possible from the pool likely to have 
been encountered in their reading matter. 


EXPERIMENT 1 


METHOD 


Subjects 

The children comprised the entire second-year in an inner city junior school 
who were present for most of the test session and were not referred for remedial 
teaching (age range 8:10 to 9:07; mean age 9:03). Between 44 and 46 children 
provided complete data on the various tasks. All but two spoke English at home and 
the performance of the latter was indistinguishable from the rest. 


Stimulus materials 

The words selected were listed in the Ladybird Reading Scheme (McNally and 
Murray, 1968) and/or other early reading schemes (Elley, 1969) and were as far as 
possible matched for frequency according to Kucera and Francis (1967).! Each word 
was typed in lower case on the centre of a 5" x 3" card. All words and non-words 
used are listed in the Appendix. 


! The main criterion in word selection was that each word be common ın the Ladybird Reading 
Scheme or in Elley's tables; matching on Kucera and Francis (1967), an adult Amencan word frequency 
count, was a secondary aim. Mean word trequencies (Kucera and Francis, 1967) were as follows: Task 1: 
nouns — 261-7, adjectives — 237-1, verbs — 265-4, function words — 445-5 and high frequency 
function words = 1,814-0; Task 2: high imagery words = 99-8; low imagery words = 311-5; Task 3: 
regular words = 804-7, irregular words = 1,253-4. 
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Task 1: Reading aloud words varying in grammatical class 

One hundred words, 20 from each of five categories (nouns, adjectives, verbs, 
function words and high frequency function words) were chosen. The words were 
short (three to six letters long) and mostly had one syllable. 


Task 2: Reading aloud words varying in imageability 

The second task contained 10 high imageable nouns, e.g., food, and 10 low 
imageable nouns, e.g., hour, which were selected from Vellutino and Scanlon's 
(1980) lists. 


Task 3: Lexical decision 

Forty words half regular and half irregular (according to Wijk, 1966) and 40 
non-words three to six letters long were selected. The irregular words? had divergent 
pronunciations or lesser correspondences (e.g., pint, sword). The non-words 
contained common  grapheme-phoneme correspondences, and half were 
homophones of real words (e.g., laik, brane), and half were not (e.g., keem, zole). 
The children had to sort items onto two marker cards, labelled ‘‘yes it is a word" 
and “по it is not a word”. 


Task 4: Reading aloud of “regular” and “‘irregular’’ words, homophonic and 
non-homophonic non-words. 
Task 3 items were presented. 


Task 5: Silent test of phonology 

This task included 10 pairs of regular words (e.g., sale-sail), 10 pairs of 
irregular words (e.g., sum-some) and 10 pairs of non-words. Within each set half the 
items sounded identical (e.g., sail-sale, ake-aik) and half sounded different (e.g., 
sail-salt, ake-auk) and were matched for visual similarity. The children had to sort 
these items onto two marker cards labelled ‘‘same’’ and ‘‘different’’. 


Standardised tests 

Several standardised tests were administered to the children: the English Picture 
Vocabulary Test (Brimer and Dunn, 1962), and the Non-Readers Intelligence Test 
(Young, 1978) were given to assess the contribution of general ability to 
performance on the tasks. The SPAR reading test (Young, 1976) was given in order 
to obtain standardised reading ages for comparative purposes. This assesses single 
word and sentence comprehension. 


Procedure 
The standardised tests were first administered as group tests. Then each child 
was individually tested on the experimental tasks during the summer term. 


For each task the order of items was randomly determined for each child. 
Examples were given for each task to ensure that the children understood the 
instructions. For Tasks 1, 3 and 5, they were told that the items were a mixture of 
proper words and nonsense words, and for Task 3 they were told: ‘‘Some are 
nonsense words. Don’t worry whether it’s a word or not, will you just try to let me 
know how it sounds’’. 


Finally, the children’s teachers were asked to classify each child as above- 
average, average or below average (Group 1, 2 or 3) in all-round reading 


1 The irregular words were mostly ‘exception’? words but seven were ‘‘strange’’ according to 
Seidenberg ef al.'s (1984) criteria, and these along with three ‘‘exception’’ words in our lists were used by 
these authors as well. All but five of the regular words were consistent and the irregular neighbours of the 
latter were mostly rare words. 
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performance, without knowledge of the children’s performance on the tests. The 
SPAR mean reading ages for the three groups were 9:5, 8:3, 7:2 respectively. 


RESULTS 
Analyses of accuracy on experimental tasks 
In analyses of covariance with EPVT as a covariate it was not significant for 
any of the experimental tasks; therefore these are not reported. NRIT was a 
significant covariate for some tasks and is used as a covariate with Reader Group 
(Good, Average, Poor) as a between subjects factor. 


For all of the experimental tasks there was a significant main effect of Reader 
Group. In the interests of brevity the F values and contrast statistics are reported in 
Table 1. It can be seen that the performance of Poor Readers was inferior to that of 
the Average Readers on every task; Good and Average Readers only differed on 
some involving non-words. 

The effects of the experimental variables, planned comparisons and their 
interactions with Reader Group are reported separately for each task. 


Reading words differing in grammatical class 

Grammatical class of words did not significantly affect reading, nor did it 
interact significantly with Reader Group. The means for correctly read words were 
respectively 19-67, 17-56 and 9-4 for Good, Average and Poor Readers. 


Reading words differing in imageability 

Table 2 presents the means for correctly read words of high and low 
imageability for the three Reader Groups. High imageable words were significantly 
easier to read than were low imageable words (F (1, 41) = 49-81, P < 0-001) and 
there was a significant interaction between imageability and Reader Group (F (2, 41) 
= 16°83, P < 0-001). Imageability was not significant for Good Readers. Average 
Readers read significantly more high imageable words than low imageable words 
(tpg (3, 41) = 3:28 P < 0-01) and this effect was significantly greater for Poor 
Readers (t (41) = 3:69, P < 0:001). 


ТАВІЕ 2 


MEAN CORRECTLY READ WORDS AS А 
FUNCTION OF IMAGEABILITY AND READER 





GROUP 
Imageability 
Reader Group High Low 
Good 9-93 9-80 
Average 9:24 8-24 
Poor 5:67 2:75 
Mean 8-50 7:27 


Maximum possible score is 10 


Lexical decision for regular and irregular words 

Mean correct lexical decision for regular and irregular words are presented in 
Table 3. Performance on regular words did not differ significantly from that.on 
irregular words and there was no significant interaction with Reader Group. 
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TABLE 3 


MEAN CORRECT LEXICAL DECISIONS AND READING ALOUD SCORES FOR REGULAR WORDS, IRREGULAR 
WORDS, HOMOPHONIC NON-WORDS AND NON-HOMOPHONIC NON-WORDS 


Lexical Decision Regular Irregular Homophonic Non- 
homophonic 
Reader Group Words Words Non-words Non-words 
Good 19-67 19-33 17-20 18-60 
Average 18-05 17-95 14-68 16-58 
Poor 13-92 13-92 10-33 12-33 
Means 17-50 17:35 14:37 16:13 
Reading Aloud 
Regular Irregular Homophonic Non- 
homophonic 
Reader Group Words Words Non-words Non-words 
Good 19-60 19-07 16-47 15:40 
Average 17-84 16:95 13-00 6:74 
Poor 9-00 9:92 5:75 2:92 
Means 16-11 15-80 12:24 9:39 


Maximum possıble score is 20 


Lexical decision for non-words 

Mean correct lexical decision for homophonic and non-homophonic non-words 
are also presented in Table 3. There was a significant effect of the type of non-word: 
homophonic non-words were more difficult to reject than non-homophonic non- 
words (F(1,43) = 14-53, P < 0-001), but there was no significant interaction with 
Reader Group. 


Reading regular and irregular words 

Mean correctly read regular and irregular words as a function of Reader Group 
are also presented in Table 3. There was no significant main effect, but there was a 
significant interaction between regularity and Reader Group (F(2,43) = 3-66, 
P < 0:05); this was caused by the slight, non-significant, reversal of the regularity 
effect in Poor Readers. 


Although the words in our experiment were selected for their familiarity to 
children, they did vary quite markedly on standard measures of frequency, therefore 
an item analysis was performed in order to see whether the regularity effect was 
obscured by these differences. 


Difficulty values were obtained for each word, i.e., the proportion of children 
reading that item correctly. These were regressed on three measures of frequency, 
Kucera and Francis (1967), an American children’s word count (Grade 3 frequencies 
from Carroll et al., 1971) and a British adult frequency count (Hofland and 
Johannson, 1982) and regularity. Estimates, adjusted for all predictors in the model, 
were obtained by the logistic modelling technique available in GLIM (Baker and 
Nelder, 1978). 


All three frequency measures were significant predictors of difficulty, with the 
American Heritage explaining more of the variation; thus difficulty values were 
therefore regressed on these and Regularity. The partial R? for Frequency was 
0:493, for Regularity 0-004, and for the interaction, 0:001. Only Frequency was 
significant (t(38) = 9:31, P < 0-001). Thus, regularity adjusted for frequency still 
has no effect within the range used in this experiment. 
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Reading non-words 

Means for correctly read homophonic non-words and non-homophonic non- 
words are presented in Table 3. Homophonic non-words were significantly easier to 
read than non-homophonic non-words (F(1,43) = 64:19, P < 0-001) and there was 
a significant interaction between the type of non-word and Reader Group (F(2,43) 
= 8:36, P « 0-001). This is attributable to the fact that Good Readers showed no 
difference, whereas Average Readers found the homophonic non-words 
significantly easier to read (tp,(3,43) = 12:06, P < 0-01), as did Poor Readers 
(18(3,43) = 4-33, P < 0-01). 


A comparison of non-word and word reading is not reported because every 
child had a higher score on word reading than non-word reading. 


Silent test of phonology 

Mean correct judgments for pairs of regular words, irregular words and non- 
words are presented in Table 4. There was a significant effect of the type of pair 
(regular word, irregular word, non-word) (F(2,84) — 7-93, P « 0-001) and no 
significant interaction with Reader Group. Regular word pairs did not differ from 
irregular word pairs. Non-word pairs were judged less accurately than both regular 
(t(84) = 2-30, P < 0-05) and irregular word pairs (t(84) = 3:42, P < 0:001). 


TABLE 4 


JUDGING HOMOPHONY OF PAIRS OF REGULAR WORDS, IRREGULAR 
WORDS, AND NON-WORDS 


Regular Irregular 
Reader Group Words Words Non-words 
Good 8:50 8-79 7:50 
Average 7:63 7:63 6:32 
Poor 5.58 5:58 4:67 


Меапѕ 7-36 7:44 6:24 
Maximum score ts 10 


Analyses of reading errors 
Word reading errors 

The reading errors in reading aloud regular and irregular words were classified 
into a number of categories devised for single word reading by Coltheart et al. 
(1980). The phonological category consists of errors which represent all the letters of 
the word but display rule violations or inappropriate rule use (e.g., flood 
pronounced to rhyme with ‘‘rude’’). Derivational and/or inflectional errors are 
those which share the root morpheme with the stimulus word but are 
inappropriately affixed, (e.g., writer > ‘‘writing’’). The visual category includes 
responses which share 50 per cent or more of the letters of the stimulus word but 
were less phonologically accurate than the categories above (e.g., spear > ‘‘speak’’). 
Other categories were devised for responses which could not be included in those 
above, i.e., preserving some letters from the stimulus word (but fewer than 50 per 
cent), preserving the first letter (only) of the stimulus, sounding out each letter. 
Errors which did not seem to resemble the stimulus words in any systematic way 
apart from approximate word length were termed unclassified. 

The distribution of these types of errors in Figure 1 shows that the incidence of 
errors differed in the three Reader Groups. Good Readers’ errors were typically 
phonological, with a few visual. For Average Readers phonological errors and visual 
errors were predominant, with some in the less than 50 per cent visual similarity 
category; and a few sounding out letters. The predominant type of error made by 


FREQUENCY 


262 Word Reading in Children 


FIGURE 1 
FREQUENCY DISTRIBUTION OF THE VARIOUS TYPES OF READING ERROR BY DIFFERENT GROUPS OF READERS 
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Poor Readers was the visual error. This group also made frequent errors which 
preserved less than 50 per cent of the letters, and were also more likely to sound out 
letters or to produce an unclassified response. Phonological errors constituted only a 
small proportion of their total errors. Derivational errors were infrequent and made 
only by the Poor Readers. 


Non-word reading errors 
The errors made when reading non-words were classified into words and others. 
The results for each of the three reader groups are presented in Table 5. This 
indicates that the Average and Poor Readers were far more likely than Good Readers 
to respond with a word to non-word stimuli. Again, omissions were relatively 
uncommon. 
TABLE 5 


FREQUENCIES OF WORDS AND NON-WORDS ERRONEOUSLY READ IN RESPONSE TO 
HOMOPHONIC AND NON-HOMOPHONIC NON-WORDS BY DIFFERENT READER GROUPS 





Stimuli 
Homophonic Non- 
words Non-homophonic Non- 
words 

Responses Words | Non-words Words Non- 

words 
Reader 
Group 
Good 44 32 58 32 
Average 77 46 167 75 
Роог 173 43 192 79 
Totals 294 121 417 186 

DISCUSSION 


Grammatical class of words appeared not to affect the reading of the children 
tested and there was no indication of a specific function word difficulty. Word 
imageability did significantly affect performance for Average and Poor Readers, 
whereas Good Readers showed no effect of imageability. If, as Frith (1985) 
suggested, imagery effects characterise early stages of reading, the Average and 
Poor Readers display this characteristic. Ceiling effects in Good Readers preclude 
any conclusions for these children. 


Regularity of spelling-to-sound correspondence affected neither reading aloud, 
nor lexical decision, nor judgments in the silent test of phonology; this implies that 
the children have established a visual access code for the words used. 


Non-word reading confirmed the earlier findings of Firth (1972), Perfetti and 
Hogaboam (1975), and others, in that Good Readers read significantly more non- 
words accurately than did Average Readers, who in turn were better than the Poor 
Readers. For Good Readers there was no significant difference in reading aloud 
homophonic and non-homophonic non-words; and this cannot be attributed to 
ceiling effects. Average and Poor Readers found homophonic non-words much 
easier to read. If this is considered in relation to the error classifications, as might be 
expected from children in the early stages, Average and Poor Readers displayed a 
very marked tendency to respond to words with visually similar words, and to non- 
words with words as shown in Table 5. This strategy would lead to more correct 
responses for homophonic non-words, whereas producing a visually similar word 
for non-homophonic non-words results in an error. 
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For Good Readers 70 per cent of the errors on irregular words were 
regularisations, e.g. pint > |pInt| (pronounced to rhyme with tin?) which indicates 
an attempt to apply grapheme-phoneme rules. Although good at reading non- 
words, they had not mastered all the simple letter-sound conversion skills since they 
made errors on simple CVC non-words such as ‘‘sem’’, ‘‘hoz’’ and ‘‘bon’’. In 
contrast, regularisations constituted only 31 per cent of Average Readers’ errors and 
non-word reading was poor. Only 3 per cent of errors were regularisations for Poor 
Readers whose grapheme-phoneme conversion skills were minimal and seemed to be 
restricted to some limited graphemic parsing and phoneme assignment with an 
absence of blending skills. 


The above results led us to reconsider the view that a prelexical phonological 
route develops before the visual route (Doctor and Coltheart, 1980; Coltheart, 
1983). If one argues that reading depends on adequate prelexical phonological 
recoding, problems are posed by the Average Reader Group. They have achieved 
reasonable competence in reading (87 per cent correct on word reading — see Table 
2 — and an average SPAR Reading Age of approximately 8:3 but are decidedly 
poorer at prelexical phonological recoding (only 34 per cent of non-homophonic 
non-words read correctly — see Table 2). 


EXPERIMENT 2 

The sentence comprehension task used with children aged 6-10 years by Doctor 
and Coltheart (1980) was presented. The younger children in their sample made 
significantly more errors on sentences which were incorrect but which sounded 
correct, i.e., which included an inappropriate homophone (e.g., Ino your name). A 
similar but less marked effect was observed with sentences containing non-words 
which were homophonic with the correct word (e.g., I noe your name). This pattern 
of results was interpreted to indicate the use of a phonological access code in reading 
for meaning by young children. Since the magnitude of these effects diminished 
across the age range from 6 to 10 years they suggested that older children come to 
rely more on direct visual access. 


METHOD 


Subjects 
The 45 children from Experiment 1 were presented the tasks of Experiment 2. 


Stimulus materials 

Pretest on homophones and non-words: The 12 pairs of homophones (e.g., 
wood, would) and 12 homophonic non-words (e.g., bloo) used in the sentences and 
printed on cards were randomly ordered. The subject’s task was to read each letter 
string aloud, to indicate if it was not a word, and to make up a sentence using it if it 
was. 


Sentence comprehension task: All the sentences devised by Doctor and 
Coltheart (1980) were presented to all subjects. The 96 correct sentences included 24 
which used a homophone, 24 matched controls and 48 fillers. The 96 incorrect 
sentences included 48 which sounded correct (homophones) and 48 which sounded 
incorrect. Within these sets, half of each contained a real word and half a non-word. 
These sentences were printed in two random orders in booklet form. 


The children were instructed to put ticks next to those which made sense and 
crosses next to those which did not make sense. It was stressed that the sentence had 
to have all words correctly spelled, and not merely sound correct. Sample sentences 
were presented and additional examples were completed by the children before the 
main set. 
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RESULTS 


Pretest performance 


non- 


Mean errors (out of 12) in reading aloud and in usage of the homophones and 
words are presented in Table 6. 


TABLE 6 


MEAN ERRORS IN READING ALOUD AND USAGE OF HOMOPHONES AND 
NON-WORDS IN THE PRETEST FOR EXPERIMENT 2 


Reading Aloud Usage 
Words Non-words Words Non- 
words 
Good 0-33 1:73 1:37 0-60 
Average 1-22 3-17 2-75 3-06 
Poor 4:92 8:0 6°05 5:42 





Means 1:91 3-98 3:17 2:87 


Maximum possible score per cell is 12 


The three Reader Groups differ in their performance on this task, but as in 


Doctor and Coltheart's (1980) data the error rate on the pretests is considerably 
lower than it is on incorrect sentences which sound right. 


Judgments of correct sentences 


Mean errors on correct sentences with homophones and those without are 


shown in Table 7 for the three Reader Groups. 





TABLE 7 
MEAN ERRORS ON SEMANTICALLY CORRECT AND INCORRECT SENTENCES AS A FUNCTION OF READER 
GROUP 
Semantically Correct Sentences 
Reader Group With Homophones Without 
Homophones 
Good 2°53 1-80 
Average 2:00 1-30 
Poor 11-58 9-42 
Mean 4-73 3-64 


Semantically Incorrect Sentences 


With Words With Non-words 
Reader Group Sound Right Sound Wrong Sound Right Sound 
Wrong 
Good 7-07 1:13 3-13 0-73 
Average 11-44 0-67 6°50 0-44 
Роог 14-58 12-25 15:25 11:17 


Maximum possible score is 24 per cell 


Judgments of incorrect sentences 


Mean errors on these four types of incorrect sentences are presented in Table 7. 


A three-way analysis of variance yielded a significant effect of Reader Group 
(F(2,42) = 27-67, Р < 0:001). Subjects were less accurate at rejecting sentences with 
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words than those with non-words (F(1,42) = 34-84, P < 0-001). Sentences which 
sounded correct were more difficult to reject than those which sounded wrong 
(Е(1,42) = 64-24, Р < 0-001). 

Reader Group interacted significantly with Presence of Non-words: F(2,42) = 
6:33, P < 0-005), and with Sound: F(2,42) = 6-41, P « 0:005. The two-way 
interaction between Presence of Non-word and Sound was also significant (F(1,42) 
= 11-57) P < 0-001). Mean errors representing these interactions are presented in 
Table 8. The three-way interaction between Reader Group, Presence of Non-word 
and Sound was highly significant (F(2,42) — 9-02, P « 0-001). 


TABLE 8 
MEAN ERRORS ON VARIOUS TYPES OF SEMANTICALLY INCORRECT SENTENCES WITH T VALUES AND P FOR 
NNED COMPARISONS 


Semantically Incorrect Sentences 


Reader Group With Words With Non-words t(42) P 

Good 4:10 1-93 4-54 +s 

Average 6:06 3-47 5-92 ы 

Роог 13-42 13-21 0-39 NS 
Sound Right Sound Wrong (42) 

Good 5:10 0-93 3.71 ae 

Average 8-97 0.56 8-22 n 

Poor 14:92 11:71 2-56 * 
With Words With Non-words t(84) 

Sound Right 10-82 7:71 4:32 «е 

Sound Wrong 3-91 3-40 0-71 NS 





*P«0:05, ** P<0-01, *** P «0-001 


Planned comparisons of the Reader Group main effect indicated that Good 
Readers made fewer errors than Average Readers (t(42) = 7-02, P < 0:001) and 
Average Readers fewer than Poor Readers (t(42) = 6:06, P < 0:001). 


The t values and P for the simple main effects are reported in Table 8. The 
Reader Group and Presence of Non-word interaction seemed to reflect the fact that, 
for Poor Readers only, non-word sentences were not easier to reject. The Reader 
Group and Sound interaction arose because Average Readers were dispropor- 
tionately affected by sentences which sounded correct, though all the groups had 
difficulty in rejecting these sentences. The interaction of Presence of Non-word with 
Sound occurred because when sentences sounded correct fewer errors were made on 
those having non-words. 


The three-way interaction was examined by comparing the effects of Sound at 
fixed levels of Reader Group and Presence of Non-word. For sentences containing 
words only, both Good and Average Readers made more errors on sentences that 
sounded correct (t(84) = 4-76, P « 0-001 and t(84) = 9-46, P « 0-001, 
respectively) whereas the performance of Poor Readers did not differ on these types 
of sentences. In the case of sentences containing non-words, Good Readers made 
few errors and their performance on sentences that sounded correct did not differ 
significantly from those that sounded incorrect. In contrast, both Average and Poor 
Readers made significantly more errors on sentences that sounded correct than on 
sentences which did not (t(84) = 5:32, P < 0:01 and t(84) = 2:93, P < 0-01, 
respectively). 
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DISCUSSION 


Apart from overall lower levels of accuracy in performance in our study, the 
pattern of performance obtained was very similar to that found by Doctor and 
Coltheart (1980).? Error rate on correct sentences was very low. Incorrect sentences 
which sounded correct were harder to reject than sentences which sounded incorrect. 
Sentences containing only words were harder to reject than those containing a non- 
word. The high error rate on sentences with non-words which sounded correct led 
Doctor and Coltheart (1980) to postulate the use of a pre-lexical phonological code 
in reading sentences for meaning. 


A suggestion by Henderson (1982) led us to consider an alternative explanation 
in terms of post-lexical phonological coding. We suggest that the sentence 
comprehension task requires two processing stages: the first involves direct visual 
access for the words in the sentence, the second requires the operation of a post- 
lexical phonological code used for sentence comprehension (Kleiman, 1975). Saffran 
(1982) and Shallice (1979) have also postulated the existence of a post-lexical 
phonological code, used by working memory, which serves to retain the surface 
structure of speech during sentence comprehension. This mechanism would cause 
errors on sentences which sound correct because the post-lexical phonological code 
cannot differentiate homophones. 


Thus, if the first stage is an attempt at visual access, the Good Readers, with a 
well-specified lexicon, were able to judge that an item was a non-word when it failed 
the lexical search. This obviates the need for further processing of the sentence. 
Thus, they were unaffected by whether or not the non-word is homophonic with the 
appropriate word. 


Average and Poor Readers with fewer words represented, and in less detail, in 
the visual word recognition system, were less likely to cease processing sentences 
with non-words at the first stage. These children tend to access visually similar real 
words when presented with non-words. Thus, they are more likely to access the 
homophone which should have been in the sentence for sentences containing 
homophonic non-words than for sentences which did not sound (or look) correct. 


The second stage involves accessing the cognitive system to retrieve word 
meanings and the establishment of a post-lexical phonological code to enable 
syntactic processing to occur. The Average and Poor Readers could make errors at 
this stage in two ways. They might access the inappropriate meaning of the 
homophone or they might access the cognitive system from the post-lexical 
phonological code. In either case the meaning appropriate to the contextual 
information provided by the sentence would be activated. 


Good Readers, as indicated by their superior pretest performance, were able to 
access the appropriate meanings of homophones, and the occasional errors could 
have been caused by failure to check whether the entry in the post-lexical 
phonological code was the appropriate one. 


The possibility that visual similarity caused their results was considered by 
Doctor and Coltheart (1980), since the non-words in sentences which sounded 
wrong were also visually rather different from the word they replaced. Thus, in a 
fourth experiment, they equated the graphic similarity (Weber, 1970) of the non- 
words to the target word. This made no difference to performance, allowing them to 
maintain that the errors arise as a result of pre-lexical recoding. This conclusion rests 
on the assumption that the graphic similarity index used adequately represents the 


? We did, however, find that correct sentences with homophones were significantly more difficult 
than correct sentences with no homophones. Their data showed a non-significant trend in this direction. 
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degree of visual similarity. However, recent evidence by Monk and Hulme (1983) 
indicates that word shape is also an important variable for adults in a proof-reading 
task. Differences in shape produced by the presence of ascenders or descenders in 
words affected error rate over and above the number of letters in common with the 
correct word. 


Inspection of the non-words in Doctor and Coltheart’s Experiment 4 indicates 
that while 7/20 experimental non-words differ from the appropriate homophone in 
the presence of letters with ascenders or descenders (e.g., bie and buy), such 
differences occurred for 17/20 control non-words (e.g., fil and for); thus, visual 
similarity may have caused the homophone effect. 


CONCLUSIONS 


Our data do not support the suggestion by Doctor and Coltheart (1980) that less 
skilled readers initially use a phonological access code and only switch over to the 
use of direct visual access later. The Average and Poor Readers in our study had 
poor grapheme-phoneme conversion skills, were unaffected by regularity in spelling- 
to-sound correspondence, and appeared to be using direct visual access to read. 
Thus, their performance resembles that of the much younger readers in whom 
reading appears to depend on direct visual access (Francis, 1984; Seymour and 
Elder, 1986). We have also argued that the phonological coding effects observed in 
Doctor and Coltheart’s (1980) study could arise post-lexically. Consequently, our 
data are more consonant with models which suggest that effective phonological 
access lags behind visual access in the development of reading. 
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APPENDIX 
Word Lists 


Task I 

Nouns: bed, road army, art, ball, food, ground, party, river, week, gun, club, life, home, body, 
case, church, day, farm, hair. 

Adjectives: red, cold, fine, hot, easy, bad, deep, late, real, short, daily, full, blue, far, hard, wide, 
best, high, long, black. 

Verbs: paid, sent, read, shown, told, got, feel, hear, make, went, found, kept, led, lost, meet, came, 
help, gave, ask, bring. 

Function Words: soon, else, ago, ever, thus, yes, nor, above, across, under, upon, how, why, along, 
down, much, until, here, away, yet. 

High Frequency Function Words: into, will, may, such, this, from, than, her, and, but, when, not, 
can, all, with, that, then, about, for, these. 
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Task 2 
High Imageable: egg, car, snow, book, mother, shoe, food, star, branch, cloud. 
Low Imageable: age, air, year, hope, thought, life, hour, east, chance, health. 


Tasks 3 and 4 

Regular: back, cross, get, dance, had, pine, base, spear, out, can, take, turn, tooth, did, ear, sort, 
hand, make, first, think. 

Irregular: aunt, laugh, two, break, put, pint, sign, sword, who, you, come, love, flood, was, eye, 
move, head, come, their, would. 

Homophonic Non-words: werk, goze, wor, moovy, hoo, joak, burd, noase, owt, doo, bloo, laik, 
yeer, trane, grene, phog, brane, bie, beeze, wun. 

Non-homophonic Non-words: phid, kweed, zea, voard, bon, jead, zole, prode, foo, hoz, cade, 
keem, vound, oin, sem, foun, hule, nain, streed, scang. 


Task 5 
Same Regular Pairs: plain-plane, sail-sale, sighs-size, tacks-tax, tail-tale. 
Different Pairs: plain-plant, sail-salt, size-signs, tax-talks, tail-talk. 
Same Irregular Pairs: еуе-1, sew-so, knows-nose, some-sum, piece-peace. 
Different Pairs: eve-I, new-no, nose-knots, home-hum, piece-pence. 
Same Non-word Pairs: ake-aik, leam-leem, korm-kawm, nime-nyme, doim-doym. 
Different Pairs: ake-auk, lerm-leem, korm-kaim, nime-nume, doim-doum. 
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THE ROLE OF PHONEMIC AWARENESS IN THE READING 
STYLE OF BEGINNING READERS 


By IAN STUART-HAMILTON 
(Department of Psychology, University of Manchester) 


Summary. Phonemic awareness is highly correlated with the reading abilities of young 
schoolchildren. However, no-one has tested how it influences reading. Two experiments 
compared 20 children possessing phonemic awareness with 20 matched for reading and 
chronological age who lacked it. In Experiment 1, the children's ability to detect changes 
in the graphemic structure of isolated words was assessed. In Experiment 2, the children's 
reading from their schoolbooks was examined by miscue analysis. In comparison with the 
non-aware group, the aware group was significantly more sensitive to changes in 
graphemic structure, made greater use of graphemic cues and made fewer ''nonsense" 
errors in ‘‘normal’’ reading. 


INTRODUCTION 


Tuis paper attempts to link two previously disparate areas of reading research — 
reading style and phonemic awareness in beginning readers. Phonemic awareness 
may be defined as the conscious realisation that words can be decomposed into 
discrete units (phonemes) and that words have phonemes in common, coupled with 
the ability to isolate and manipulate phonemes in words. It has been known for 
some years that phonemic awareness is highly correlated with the reading abilities of 
beginning readers (Bradley and Bryant, 1978, 1983; Morais et al., 1979; Lewcowicz, 
1980). This was emphasised by Bradley and Bryant (1983) who found that the better 
a child's awareness at pre-school level, the better his or her reading ability at 7 years. 
Explanations of why phonemic awareness and reading ability are related are usually 
that the acquisition of awareness reveals the logic of the alphabetic system to the 
child (Rozin and Gleitman, 1977). Put simply, if a child does not know what 
phonemes are, then he or she cannot know what letters represent. Though a child 
may use letters as visual cues in reading (though not efficiently, it is argued) he or 
she is blind to the logical lynchpin of written English. 


This is a plausible argument, but it has never been empirically examined. 
Therefore, it was decided to test if phonemic awareness does affect a child's 
processing of letter (grapheme) information. This was to be done by examining its 
effect on reading style. 


“Reading style" is shorthand for any measure of relative reliance on reading 
sub-skills. A popular measure of this in young readers is error analysis, and the best- 
known study in this field is probably that of Biemiller (1970). Biemiller heard a 
group of American Ist-graders read at intervals through their first year at school, 
and recorded any mistakes that they made. The errors were then classified into four 
types: (1) insertion (word added which was not in the text); (2) non-response 
(‘‘N-R’’, the child stops reading and/or asks what the next word to be read is); (3) 
omission (a word is ‘‘skipped over’’ with no apparent pause); and (4) substitution 
(word is misread). The substitution errors were further classified into those which 
had the same general graphemic structure as the word which should have been read, 
and which made the same contextual sense as the target word in the text (graphemic 
and contextual] substitutions); those which only had the same graphemic structure 
(pure graphemic substitutions); those which only made contextual sense (pure 
contextual substitutions); and substitutions which made no sense at all (nonsense 
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substitutions). Biemiller found that initially children’s errors were largely contextual 
substitutions. They then entered a ‘‘N-R phase’’ where over 50 per cent of the errors 
were N-R miscues. At the same time there was a rise in the number of pure 
graphemic substitutions. In a third phase, errors were chiefly graphemic and 
contextual. Cohen (1975) effectively replicated Biemiller's findings. But Dodd 
(1982) using the same paradigm, did not. Dodd found no evidence for the N-R 
phase, and pure graphemic substitutions remained constantly low throughout the 
year. Instead, throughout the year there was a steady rise in the number of 
contextual and graphemic substitutions. The differences between Dodd’s, 
Biemiller’s and Cohen's findings could be due to different subject groups in 
different education systems (Dodd's subjects were English, the other researchers' 
were American), or to the Iow inter-observer reliability in reading style studies 
(Hood, 1976). There is, nonetheless, an agreement that early reading development is 
characterised by an increasing dependency on graphemic information from an initial 
heavy use of contextual information. 


Dodd lent further support to this hypothesis in a ‘‘posting and matching’’ task 
(full details in the Method section of this paper). Dodd gave children the task of 
sorting through some printed cards for copies of a target word. The pack of cards 
comprised three copies of the target word and 11 graphemically altered versions of it 
(i.e., some of the letters were changed). Dodd found that with increasing 
age/reading ability children scored more highly on the task and were able to detect 
increasingly subtle changes in graphemic structure. Concluding the study, Dodd 
questioned whether the observed changes were due to an ability to deal with 
increasingly complex stimuli, or to the ‘‘realisation of the importance of letter 
identity”. 

Thus, there is evidence for a strong role of graphemic processing in early 
reading, and it is now time to examine the role of phonemic awareness in these 
events. To do this, it was decided to compare children’s performance on an 
awareness test with that on tests of reading style. The test chosen was one of the 
author’s own devising. He had earlier found (Stuart-Hamilton, 1984) with a sample 
of 100 infant school children that phonemic awareness test scores correlated with 
Carver Word Recognition Test scores at r = 0.79 and with Schonell Spelling Test 
scores at r = 0:72 (both significant at P < 0:001). To test reading style Dodd’s 
measures of error analysis and Dodd’s posting and matching task were used. In 
order to give as clear a picture of the effect of phonemic awareness as possible, it 
was decided to compare groups with no awareness with groups with some. To ensure 
that any differences were not due to differences in reading ability and/or age, the 
aware and non-aware children were matched for these factors. 


METHOD 

Subjects 

The subjects were the entire reception classes of four infant schools in the 
north-west of England, and had been in school 1-1-5 terms when tested. All the 
children were native English speakers. At the teachers’ request, SES data on 
individual children were not recorded. All the schools used an ‘‘eclectic’’ reading 
teaching system, (i.e., a mixture of phonics and whole word methods). In total, 154 
children were tested, from whom 20 matched pairs were obtained. Children with 
assessed hearing/speech impediments were excluded. 


Materials and procedure 

The tests used were, the Carver Wood Recognition Test, (Carver, 1970); the 
two phonemic awareness tests of the author's own devising; Dodd's posting and 
matching tasks; and the error analysis. 
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The Carver test 

This consists of 50 items; in each item the child is told a word in isolation and in 
a high context sentence, and is then required to locate its printed representation 
from a list of alternatives. 


The phonemic awareness test 

The test was in two sections. The first of these, termed ‘‘Input’’ was in itself in 
two parts. (The terms “‘input’’ and ‘‘output’’ imply judgments on the children’s 
response mechanisms. This is not intended. Better terms would be '*comparison"' 
and "generation", but for reasons of compatability with previously-published 
studies the present terminology is retained.) The first part of the Input section — 
“Input Rhyme” — required the child to judge if two spoken words ‘‘ended the 
same’’. There were two groups of words presented, with five target pairs and five 
non-target pairs in each (see Appendix 1). The target pairs ended the same, whilst 
the non-targets did not. The full targets had more letters in common than the half 
targets. Children had been found to be insensitive to this difference in earlier 
studies. However, to preserve continuity with these, this procedure was continued. 
The full non-targets had medial letters in common to ensure that the children were 
not making similarity judgments on the medial letters. 


The second section of the Input task — ‘“‘Input Alliteration'' — required the 
child to judge if two words began with the same sound. Two groups of words were 
used — full targets coupled with full non-targets, and half targets coupled with half 
non-targets (see Appendices 1 and 2). The similarities between the words were 
identical to those in the input rhyme section, except that they now lay at the start of 
the word. 


The second section of the phonemic awareness test, termed “Ошрш” was also 
in two parts. The first, **Output Rhyme’’, required the child to say a word which 
"ended the same’’ as a presented spoken word. Five real consonant-vowel- 
consonant words were presented in total; these are listed in Appendix 2. The second 
part of the “Output” task, ‘‘Output Alliteration’’, required the child to say а 
word which began the same as a spoken example. A list of examples is given in 
Appendix 2. 


Procedure 

The “Input Rhyme” task was introduced to the child by asking if a spoken 
word pair **ended the same” (e.g., **do cat and car end the same?’’). This first pair 
to be presented always consisted of words which ended the same, though not of 
words which would appear in the test. If the child said that they did, then further 
practice items were given. To ensure that he or she was not responding purely to the 
phrase ‘ће same’’, amongst these were some pairs which began the same, (e.g., 
“cat, car"). If the child, even after correction, said that three examples of these 
same-beginning words ‘‘ended the same” then a score of zero was given. 


Children who failed because they were responding purely to a similar sounding 
word were quite rare (four of the initial sample). Much commoner was the child who 
responded randomly, persistently replied “уез”! or ‘‘no’’, or simply expressed 
incomprehension. Here remedial measures were attempted. Necessarily, the precise 
procedures differed between children, but included were, employment of different 
terms in explaining the task, the breaking of the task into sub-components (e.g., 
asking the child to identify just the final letter of one word), and working though an 
example with him or her. If the child still answered incorrectly (i.e., was still 
persistently saying ‘‘yes’’ or “по”, or answered randomly, or was still telling the 
experimenter that he or she did not know) then the test was terminated, and the child 
received a score of zero. 

dr aH dua 
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Following this training phase, the phonemic awareness test proper began. The 
word pairs in the Input Rhyme task were presented in two counterbalanced blocks. 
The “Еш” group consisted of the full rhyme and full target pairs (see Appendices 1 
and 2). Within this block, the words were presented randomly. The ‘‘Half’’ block 
(the half pairs and the half non-targets), was similarly presented. The aim of this 
pairing of targets and non-targets was to ensure the child was not responding to 
similarities in the medial sounds of the words. 


An identical procedure was used in the presentation of the Input Alliteration 
task, except that the child was here required to judge if a pair of words ‘‘began the 
same". 


The output rhyme task was introduced by asking the children to think of a word 
which ‘‘ended the same"! as a spoken example. This example was not one of the test 
items. Should the children produce a correct answer, then they were given some 
more practice examples, and then the test items. Should they respond incorrectly, 
then explanatory measures were again attempted. These of necessity varied from 
child to child, but included changes of terminology (e.g., ‘‘finishes’’ instead of 
‘‘ends’’), attempts to break the test into smaller components (e.g., to say what letter 
the word ended with), and working through an example with the child. (Unlike the 
Input task, a few children responded to this treatment). Children who began to 
respond correctly were treated from then on like children who had responded 
correctly from the first example. If they persistently failed to respond, then this 
section of the test was terminated, and they were given a score of zero. 


The Output Alliteration task was conducted in an identical manner, except that 
the child was required to produce a word which ‘“‘started the same”. 


Dodd's word recognition (“‘posting’’) test 

The child was required to judge if various letter strings were the target word or 
not. Five target words were used — cow, hen, mouse, donkey, and elephant (‘‘hen’’ 
was substituted for **dog" which was used іп Dodd's original experiment, because it 
was suspected that the children were reading ‘‘dog’’ as a *'sight"" word). There were 
11 transformations of each target word, and these are listed in Appendix 3). (The 
**jumbled"' condition replaces a ‘‘transformation’’ condition in Dodd's experiment, 
which contained meaningless letter strings of the same length as the target). The 11 
transformations of each word, plus five copies of each target word, were printed in 
lower case letters separately on 4” by 6” cards, giving 15 test cards, and one example 
card for each target word. Each target was dealt with separately. 


The test procedure was as follows. The 11 transformation cards were shuffled, 
and the four target cards folded into the pack, with the provisos that two or more 
were never inserted together, and a target card was not at the top or bottom of the 
pack. The example of the target (the fifth card) was placed on top of the pack. A toy 
postbox, with a picture of the animal concerned (e.g., if the target word was ‘‘cow’’, 
a picture of a cow was on the postbox) was put in front of the child. He or she was 
told that the task was to decide which cards were to be posted to the animal, and 
which were not. Only those cards with the animal's name on were to be posted and 
the reject cards given to the experimenter. The example target card was given to the 
child, who was told what it said, then told to look at the card very carefully, and 
then post it. The child then worked through the pack of cards. He or she was not 
allowed to see the target after it had been posted. This was to impose a memory load 
upon the child so ‘‘that the errors would show more clearly the elements of the word 
that the child was able to pay attention to’’, (Dodd, 1982, p. 62). This procedure was 
used for each of the five target words in turn. The order of presentation of the words 
was varied using a Latin square design. 
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Dodd’s word matching task 

The sets of 15 cards in the posting task were used. The child had to match the 
cards to a picture of the animal and the target word printed in larger letters than the 
cards (to prevent direct matching by word length or size). The 15 cards were 
distributed as for the word posting task. The child had to pick out all the cards 
which gave the animal’s name like the bigger printed card. The example card was 
again used to give the child an idea of the task. In Dodd’s task, the target card was 
then placed in view of the child above the picture. It was decided that this could lead 
to the child matching by direct visual comparison. Thus, in this experiment, the 
child’s choices were placed in two face-down piles. The same procedure was adopted 
for each of the five words. The order of presentation of these words was varied from 
child to child by a Latin square design. 


Reading style analysis 

The measure used here is adapted from Dodd (1982). Each child read for the 
experimenter from his or her current reading book on three consecutive schooldays. 
Each child read at least 300 words over this period. Reading was terminated at a 
suitable point in the story. Errors were recorded in pencil in a copy of the book the 
child was reading. Any deviation in the child’s reading away from the presented text 
was recorded. These errors were marked and classified into the following types: 


(a) Substitutions: where a word was misread. Marked by crossing out the word 
in the text, and writing the substituted word above it. 


(b) Insertions: where a word is added to the text. Marked by entering a caret (^) 
at the point of insertion, and writing the inserted word above it. 


(c) Omissions: where a word is not read, or is partly read, but is then skipped 
nsa with no discernible break in reading. Marked by crossing out the omitted 
wor 


(d) Non-response: where a word is not read, or is partly read, and instead of 
“skipping over” it, the child stops. Dodd's criteria for assessing this were used — 
i.e., a pause of at least 10 seconds, or a statement from the child to the effect that 
they did not know the word. The experimenter supplied the word, and the child was 
instructed to carry on reading. A non-response was marked by drawing brackets 
around the word. 


Eccentric pronunciations, pauses and repetitions were not recorded. Except in 
the case of non-responses, the experimenter did not correct the child unless asked to 
by the child, or if the child was constantly misreading a word, or if previous errors 
were leading to confusion. If the child made an error and then corrected him or 
herself, this was recorded as a self-correction (marked by writing ‘‘s.c.’’ above the 
word). These were fairly rare, and were not included in the analysis. 


The substitution errors were further classified into the following types: 


(a) Those that were only contextually acceptable, and bore no graphemic 
similarity to the stimulus word (e.g., reading ‘ће white cat’’ for ‘ће black cat"). 


(b) Those that were only graphemically acceptable, and made no contextual 
sense (e.g., ‘ће cat was blue” for ‘‘the cat was big"). A substitution was held to be 
graphemically similar if it began with the same letter as the stimulus word. This 
apparently crude measure was devised by Biemiller, who in justification cited 
Weber's (1970) finding that the first letter mistake was strongly correlated with other 
graphemic similarities between the substitution and target words. Dodd points out 
that Weber placed a strong weighting on initial letter correspondence in drawing up 
her similarity index. However, there are two points in favour of Biemiller's measure. 
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The first is that it is simple, easy to use and unambiguous. The second is that it 
would seem to be a good measure. In this present study, the number of substitution 
errors which began/ended with the same sound and also the number which ended 
with the same grapheme were also calculated, out of interest. The difference 
between the scores registered by this more comprehensive measure and Biemiller’s 
are negligible. 


(c) Those that were contextually and graphemically acceptable (e.g., ‘ће brave 
boy" for *'the big boy”). 


(d) Those errors which were neither graphemically nor contextually acceptable 
(e.g., “е pink rhinoceros” for ‘‘the grey rhinoceros”) 


Design 

All children were tested individually. An assistant first tested them on the 
Carver and phonemic awareness tests. Children were judged as not having phonemic 
awareness if they scored zero on all sections of both tests. Children were held to have 
phonemic awareness if they scored above zero on an alliteration and a rhyme section 
of at least one of the two tests (in fact, all children scored on every section of the two 
tests, with the exception of two, who failed the output rhyme task). Children were 
then matched (one with phonemic awareness, one without) on the basis of their 
Carver scores being within two points of each other. These initial pairings also 
matched very well for chronological age. Pairs were always drawn from the same 
school. Twenty pairs were drawn from 154 pupils of four schools (one school 
provided six pairs, another four, and the other two provided five each). 


The matched pairs were then tested by the author within a week of their pairing. 
The children were blind tested to avoid possible biasing effects in the reading style 
analysis. They were tested on four consecutive schooldays, and one day one week 
later. The first three days were spent on the reading style analysis, and the fourth on 
the posting task. A week later, they were tested on the matching task. Dodd’s tasks 
always followed the reading style analysis. 


Scoring 

Scoring on the Carver and phonemic awareness tests was as detailed before. 
One point was given for each error on the reading style analysis. Each error on 
Dodd’s posting and matching tasks was also given one point. In order to compensate 
for there being two letter strings for some transformations but not for others, scores 
for the ‘‘single letter string” transformations (middle doubled, reversal and middle 
missing) were doubled. 


RESULTS 

Pairing of subjects 

The mean ages of the two groups (in months) were: aware, 66:0 (sd = 2:57); 
non-aware, 66:1 (sd = 2-38). As far as possible, children were paired by sex. There 
were nine all-male, six all-female, and five mixed pairs. The mean scores of the 
aware children on the four phonemic awareness subtests were: input rhyme, 17-55 
(sd = 2:5); input alliteration, 17-55 (sd = 1-85); output rhyme, 4-11 (sd = 0-87); 
output alliteration, 4-6 (sd = 0:63). The non-aware children did, of course, score 
zero on all these tests. The mean Carver scores of the two groups were: aware, 17:4 
(sd = 7:76); non-aware, 16:4 (sd = 8-02). This difference was not significant 
(paired t test: (19) = 0-98, P > 0-05). (According to the Carver scoring tables, these 
scores give the subjects a reading age of 4:3-4:6. However, Carver's test has not been 
standardised on this age group. The test was standardised on a group of 7 + -уеаг- 
olds, and the ‘‘reading ages" below this were calculated by extrapolating a straight 
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line back from the test sample’s results (Carver, 1970).) In addition, teacher ratings 
of reading ability were obtained on a scale of 1 to 5, (1 = well below average; 2 = 
below average; 3 = average; 4 = above average; 5 = well above average). Aware 
children had a mean rating of 3-43, compared with the non-aware children’s 2-83. 
This was not a significant difference (paired t test; 1(19) = 1:98, P > 0-05). 


TABLE 2 


MEAN NUMBER OF NON- TARGETS INCORRECTLY POSTED OR MATCHED BY PHONEMICALLY AWARE AND 
Non- AWARE GROUPS, WITH STANDARD DEVIATIONS CLASSIFIED BY ANIMAL NAME 


Cow Hen Mouse Donkey Elephant 
POSTING TASK Aware 0-6 0-7 0-95 1:42 1:25 
(Error scores/11) (1:14) (0-95) (1:08) (1-37) (0-8) 
Non-aware 2:82 2:42 2°52 3-12 3-07 
(2:22) (2:15) (1:8) (2:41) (3-07) 
МАТСНІМС ТАЅК Aware 0-35 0-12 0-72 0-72 0-82 
(Error scores/11) (0:58) (0-39) (0-73) (0-73) (0-69) 
Non-aware 1:12 0-75 1-25 1-43 1:82 
(1:56) (1:16) (1:29) (1:41) (1:32) 





Posting and matching tasks 

Mean number of non-targets incorrectly posted and matched by the two groups 
are shown in Tables 1 and 2. A three-way ANOVA of the posting task, with factors 
Group, Animal names, and Error type, showed highly significant effects of all three 
factors. There was also a significant name-error type interaction. However, there 
was no significant interaction between group and error type and/or animal name 
(see Table 3). This absence of a group interaction suggests that the phonemically 
aware and non-aware children were using the same strategy to perform the task, but 
that the aware were far more adept at using it. The patterning of errors was 
equivalent to that found by Dodd. That is, the longer words were harder than the 
shorter ones, and reversal errors were especially prevalent in the shorter words. The 
same pattern of results was found in a three-way ANOVA for the matching task (see 
Table 3), but the F value for the group difference is reduced from 13:99 in the 
posting task to 7: 21. This suggests that part of the group difference may be due to a 
mnemonic factor. A small number of errors occurred with targets which were not 
posted or matched. The pattern of these results was very similar to that of the non- 
target errors, but their frequency was too low to permit statistical analysis. 

The mean errors, collapsed across groups, were for the posting task 17-33, and 
for the matching task 7-15. The children were tested towards the end of the spring 
term. The results are roughly comparable to Dodd's subjects at their second test 
session in the summer term of their first year (17:2 and 7:1 respectively). Overall, 
ries would seem to be a high degree of consistency between Dodd's and this study's 

indings. 


Reading style analysis 

The mean numbers of words read by the two groups were: aware, 426 (sd = 
17-8); non-aware, 394 (sd = 21-4). This difference was not significant (paired t test; 
t(19) = 0-01, P = 0-33). The mean error scores are shown in Table 4. The results 
were subjected to a 2-way ANOVA, the factors being group and error type. The 
analysis found significant group (F(1,19) = 10-66, P < 0-01); error type (F(3,19) 
= 66°23, P < 0-001); and interaction effects (F(1,57) = 4:05, P < 0-01). Thus, the 
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TABLE 3 


THREE- WAY ANOVA RESULTS WITH FACTORS POSSESSION/ NON- POSSESSION OF PHONEMIC AWARENESS, 
TYPE OF TRANSFORMATION AND ANIMAL NAME, FOR POSTING AND MATCHING TASKS 








Source of 
Уапапсе ағ M F P 
Awareness (“А”) 1 99-9 13-99 0-0006 
animal (“В”) 4 2:12 5-26 0-0005 
А х В 4 0-40 0-98 0-4188 
transformation (“С”) 6 6-18 12-13 <0-0000 
POSTING AxC 6 0-91 1-78 
BxC 24 0-78 2:27 0- 0005 
AxBxC 24 0-44 1:29 0-1612 
Awareness (“А”) 1 14-2 7-21 0-0107 
animal (“В”) 4 2-21 6:92 «0-0000 
AxB 4 0-17 0-52 0-7237 
MATCHING 
transformation (“С”) 6 5-17 17-18 «0- 0000 
Axc 6 0-23 0-76 0-6051 
BxC 24 1:27 4-62 0-0000 
AxBxC 24 0-31 1:13 0-3001 


aware group were not only superior to the non-aware group, but also appeared to be 
adopting a different reading style. Attention was then turned to the substitution 
errors. There was no significant group difference in the ratio of substitution to total 
errors (paired t test: t(19) = 0-42; P > 0-05). However, there was a significant group 
difference in the distribution of types of substitution errors (see Table 5). A two-way 
ANOVA of these results found a significant group difference (F(3,19) = 10°04, P 
< 0-01) but no significant difference between error types (F(3,19) = 2-09, 
P » 0-05) However, there was a significant group-error type interaction 
(F(3,57) = 3-6, P < 0-025). 


TABLE 4 
MEAN ERROR SCORES OF PHONEMICALLY AWARE AND NON-AWARE GROUPS IN ERROR ANALYSIS, WITH 
STANDARD DEVIATIONS 


Mean No. 
Words read Substitution Insertion Omission Non-response 
Aware 426 19-25 2:35 4-1 5-4 
(17-8) (6-75) (2-92) (7-07) (4-22) 
Non-aware 394 29-75 2-9 3-6 9-95 


(21-4) (14-25) (4-01) (6-81) (9:38) 
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TABLE 5 


MEAN NUMBER OF SUB-TYPES OF SUBSTITUTION ERROR BY PHONEMICALLY AWARE AND NON-AWARE 
GROUPS, WITH STANDARD DEVIATIONS 





Total No. 
Substitution Pure Pure Graphemic 
Miscues Graphemic Semantic and Semantic Nonsense 
Aware 19:25 5-5 44 4+85 4-5 
(6:75) (3:17) (2-41) (3:73) (3-01) 
Non-aware 29:75 6:6 10:3 5:05 8-3 
(14-25) (5-54) (7-63) (5-18) (4:21) 





Thus, as for the Dodd tasks, aware children were superior to non-aware 
children. Unlike the Dodd task, however, there are interactions. Hence the group 
difference is probably attributable to the two groups employing different reading 
styles. 


DISCUSSION 

Posting and matching tasks 

Dodd questioned whether ability at these tasks reflected ‘‘realisation of the 
importance of letter identity” or the ‘‘ability to cope with an increasing number of 
letters”. This study indicates that phonemic awareness, and hence the realisation of 
what graphemes represent, is the important skill. However, the absence of an 
interaction suggests that aware and non-aware children are attending to the same 
aspects of word structure. What are these? One possibility is that the children judge 
the words by what they sound like read out loud. This seems unlikely because all the 
non-targets very obviously ''don't sound right" when read out loud, or are 
unpronounceable. Neither can the children be attending purely to the visual shape of 
the words since, although in the longer words the most visually similar 
transformations are the hardest to detect, in the shorter words the visually dissimilar 
“reversals” are the most difficult to spot. Thus, the children are not adopting a 
simple phonemic or visual strategy. 


The most parsimonious explanation is that children look for transformations in 
individual letters. If children are making a letter-by-letter search, then, logically, the 
longer the word the greater the probability they will miss an alteration. Hence, the 
observed increased number of errors with word length. However, why the reversal 
errors in the shorter words? Dodd argues that they reflect a failure to always read 
from left to right. This might be aided by two factors. (1) Most short words spelt 
backwards are still pronounceable. (2) In scanning a word letter by letter, the child 
may scan the word "backwards" as well as ‘‘forwards’’. Scanning backwards, he or 
she might temporarily forget the left to right rule, and identify the reversal non- 
target as the target. The probability of this happening is greater in short than long 
words, since in reading a long word backwards, the child has greater time in which 
to realise his or her mistake. A pre-school child spends a long time learning that seen 
objects are the same despite changes in orientation. In learning to read, the child 
encounters for possibly the first time items (words) which can only be correctly 
interpreted in one orientation. Possibly it is the conflict between his or her 
established perceptual ''rules" and the newly-acquired left-to-right rule which 
causes the reversal errors. 


How might the effect of phonemic awareness be explained by this letter search 
theory? Graphemes are largely meaningless without an awareness of the phonemes 
they represent. Thus, the acquisition of phonemic awareness should cause the child 
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to become aware of the purpose of graphemic structure. From this, it follows that he 
or she should also realise the importance of graphemic structure. This could have 
three effects. (1) Children with phonemic awareness should realise that any 
alteration of graphemic structure alters the word, and thus be vigilant in searching 
for changes. Non-phonemically aware children, lacking this realisation, should be 
more lax. (2) The aware children are probably more used to using graphemic cues in 
reading, and therefore are more familiar with letter-by-letter processing than non- 
aware children. (3) It is possible that non-aware children might detect a graphemic 
change, but, because they do not realise its importance, accept the changed word as 
a close enough resemblance to the target and classify it as such. There is some 
evidence for this from the verbal protocols of the non-aware children. Some of them 
made utterances such as “‘this wil do", or “it’s near enough", when 
posting/matching a non-target. 


It would seem from this that, whilst non-aware children can examine words 
letter by letter, their criterion for word identity is not the grapheme. Thus, an effect 
of the acquisition of phonemic awareness is to make word identification more 
stringent. In time, this added stringency should create a more accurate word 
recognition process. 


However phonemic awareness influences sensitivity to graphemic structure it is 
certainly very effective. The mean error rates for the aware group group were 4-92 
for the posting, and 2:73 for the matching task (compared with the non-aware 
group's figures of 13-95 and 6-37 respectively). The implications of this changed 
awareness are considerable. Spelling should obviously benefit from this newly- 
acquired knowledge. If the child realises that every letter of a word has to be correct, 
then he or she will try for greater accuracy. Reading too should benefit. The child 
should make fewer errors if his or her judgment of word identity is based upon 
precise spelling rather than what the word ''looks like’’. This will have a **domino"' 
effect. If the child's word identification strategy becomes more efficient, then he or 
she will read more fluently and extraction of meaning will improve. If this improves, 
then the reader will have more contextual cues to speed up word recognition. Thus, 
the effect of the acquisition of phonemic awareness may spread beyond its initial 
bounds. There is support for this argument from the findings of the studies cited in 
the introduction, which showed that phonemic awareness is strongly correlated with 
reading ability. 


Reading style analysis 

Given the possible ‘‘diluting’’ of the influence of awareness in ‘‘normal’’ 
reading described above, the finding of a group difference in errors of 8 versus 13 
per cent is considerable. (It should be stressed that many of these errors were not 
“serious”? — i.e., normally demanding correction). We shall look briefly at each 
error type, before attempting to create an overall picture of phonemic awareness's 
effect upon reading style. 


The groups showed little difference in the number of insertion errors made. It 
may be suspected that they reflect the loquaciousness of the child, rather than a 
reading-specific trait. Nearly all the insertions made contextual sense and probably 
represented a desire to inject some zest into the story (it often needed it). Dodd 
found insertion errors to be prevalent only in initial reading, and attributed them to 
a desire to *'tell a story’’, coupled with an incomplete concept that printed words tell 
all of the story (Ehri, 1979). There was also little difference between groups in the 
omission errors. The slightly higher aware group's score may be a side effect of 
faster reading. 


More substantial is the group difference in N-R errors, which is statistically 
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significant (t test: 119) = 2-10, P < 0-05). Two explanations for the non-aware 
group's larger number of errors spring to mind. First, they were unable to make use 
of the contextual information to have a guess at a word's identity. This is highly 
unlikely, given the evidence discussed above showing that beginning reading is 
characterised by use of contextual cues. Second, non-aware children could not use 
graphemic/phonemic cues to ‘“‘build up" a word, and thus gave up. This may 
partially explain the N-R miscues. Certainly non-aware children seem to ‘‘build up" 
fewer words than children with phonemic awareness. Children's attempts to verbally 
build up words were noted, out of curiosity: 16/20 aware children built up, 
compared with 6/20 non-aware children. This is not a foolproof measure. Some 
“building up" could represent false starts of pronunciation, whilst other children 
may build up silently. Nonetheless, it may be a useful general guide, and it is 
reasonable to assume that some of the N-R errors may be due to a failure to use 
graphemic/phonemic skills. However, this cannot be the complete explanation, or 
we would expect aware children's N-R errors to be virtually non-existent. The most 
prosaic explanation is that a N-R miscue is due to the child being unable to read the 
word by any method at his/her disposal. Readers with phonemic awareness make 
fewer N-R miscues simply because they have more skills. 


No evidence was found for Biemiller's N-R phase (i.e., a child with over 50 per 
cent N-R errors) although, given that the children were only seen at one point in 
time, this may not be surprising. However, Biemiller's explanation that N-R is due 
to a “combat”? between graphemic and contextual cues seems unlikely, given the 
evidence presented here. As we have seen, the rise in graphemic skills with phonemic 
awareness produces a fall, not a rise, in N-R miscues. 


As we saw above, there is no group difference in the proportion of gross 
substitution miscues, but there is a significant difference in the distribution of sub- 
types. Perhaps the easiest way to interpret these results is to divide the substitution 
errors into those with some graphemic content (the pure graphemic and the 
graphemic and semantic) and those with none (the pure semantic and meaningless). 
By this reckoning, the errors of children with phonemic awareness are 54 per cent 
graphemic, 46 per cent non-graphemic, and those of children without awareness are 
39 per cent graphemic, 61 per cent non-graphemic. The non-aware children's results 
seem to reflect their lack of graphemic skills. We have already seen that they lack 
sensitivity to graphemic change in novel words. We now have evidence that they pay 
less attention to graphemic structure in normal reading than children with phonemic 
awareness. Although the acquisition of phonemic awareness brings with it a shift in 
graphemic processing, it is interesting to note that the change is not to a reading style 
totally dominated by it: 50 per cent of substitutions of aware children are non- 
graphemic. Effects of phonemic awareness may be limited in at least two ways: (1) 
The aware children's knowledge of, and ability to utilise, graphemic/phonemic 
information is not, of course, made perfect by their acquisition of awareness, and 
they still have to /earn the seemingly myriad number of irregular spellings. (2) It is 
possible that the children were failing to co-ordinate separate cues, and were judging 
word identity by individual cues. That is, the children might read some words by 
what the context dictated, disregarding the graphemic cues, and vice versa. This 
failure to integrate accords with Biemiller's and Dodd's observations of early 
reading. 


The reading style analysis clearly shows a superior performance by the aware 
children, which seems to be derived from better graphemic processing. The groups 
showed virtually no differences in the ‘‘neutral’’ insertion and omission errors. That 
phonemic awareness should influence graphemic processing is of course what we 
have argued throughout. The results clearly indicate that the group differences in the 
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posting and matching tasks were not task-specific, but generalise to normal reading. 
But the influence of phonemic awareness is not just confined to a general sharpening 
up of graphemic skills; there are signs that it affects other aspects of reading 
performance. Most clearly, we can see that children with awareness had a lower 
miscue rate. This was due to superior graphemic skills, as we have seen. However, 
we note that they also made fewer errors which failed to convey any meaning — that 
is, N-R and meaningless substitution miscues. Thus, even when misreading, they 
were less likely to disrupt the ‘‘flow’’ of the story. Furthermore, the fact that they 
seemed able to process novel words better should enable readers with phonemic 
awareness to assimilate new words into their vocabularies more easily. Furthermore, 
if the child can develop the skill of building up words for him or herself, then more 
time can be spent reading to him or herself, and less time queuing to see the teacher 
for the meaning of a new word. At the risk of being repetitive, the acquisition of 
phonemic awareness means that lack of awareness of graphemic/phonemic 
structure is no longer a problem. 


However advantageous phonemic awareness’s acquisition appears from the 
present evidence, it should be remembered that, to all intent and purposes, the aware 
and non-aware groups were matched for reading ability by an established reading 
test and also by teacher ratings. The effects we have observed do not lead to 
glaringly obvious differences from a casual observation of reading performance. 
The point to be made here is that, although the two groups are ostensibly the same 
now, in time the difference in group performance may become very apparent. The 
results of Bradley and Bryant’s studies (1978, 1983) show very clearly the effects of 
early acquisition of phonemic awareness. It should be noted that many of the 
children without phonemic awareness made graphemic substitution errors and built 
up words. Thus, they must have been employing some kind of graphemic strategy. 
However, whatever strategy was employed it was one without insight into phonemic 
structure. This implies that the behaviour of the non-aware readers was in effect 
being performed by rote. Such findings as this question the usefulness of teaching 
word attack skills to beginning readers without first ensuring that they know why 
they are being done. Judging *'elephank"' to be ‘‘near enough" hardly bodes well 
for phonics teaching. 


This study has uncovered several new facets to phonemic awareness. We have 
seen that it greatly improves graphemic sensitivity, both in letter-by-letter searches 
of isolated novel words, and in the graphemic processing of ‘‘normal’’ text. We have 
seen how this, in turn, leads to children with awareness having a better reading style 
than non-aware children. This difference is not only confined to graphemic 
processing, but also probably generally improves reading. Though these differences 
are not apparent upon a casual observation, they probably are the first indicators of 
later, and much more apparent differences. However, the main purpose of this 
study has been to show not that phonemic awareness is important for reading (which 
we have already seen), but how it is important. 
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APPENDIX 1 

NATURE OF INPUT PAIRS IN PHONEMIC AWARENESS 
Target pairs 
Full rhyme: identical vowel and consonant ending (e.g., bag, wag). 
Half rhyme: identical consonant ending only (e.g., fat, cut). 
Full alliteration: identical vowel and consonant beginning (e.g., bat, ball). 
Half alliteration: identical consonant beginning only (e.g., boy, bag). 
Non-target pairs. 
Full non-targets: only medial vowel in common (e.g., boy, dog). 
Only coupled with ‘‘full rhyme” and ‘‘full alliteration” pairs. 
Half non-targets: no letters in common (e.g., bag, cof). 
Only coupled with **half rhyme” and ''half alliteration” pairs. 


APPENDIX 2 

WORDS USED IN THE PHONEMIC AWARENESS TESTS 
Input 
Full rhyme: bed, fed; well, bell; lot, pot; but, nut; tub, rub. 
Full non-target: cat, man; ten, leg; cut, run; top, lot; pen, red. 
Half rhyme: run, hen; not, but; doll, well; fill, ball; gun, ran. 
Half non-target: leg, box; sat, pin; run, big; cup, red; got, pen. 
Full alliteration: bed, bell; cup, cut; box, boy; run, rub; fed, fell. 
Full non-target: cat, man; top, got; hill, big; fun, cup; doll, not. 
Half alliteration: bed, bag; lot, lip; fat, fed; bell, big; wall, wet. 
Half non-target: cat, dig; but, sad; bat, tub; cup, big; box, red. 
Output 
Rhyme: bed; log; pot; fell; gun. 
Alliteration: cut; pen; sat; box; dog. 
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TARGET WORDS AND TRANSFORMATIONS OF TARGET WORDS USED IN DODD'S POSTING AND MATCHING 


Target: COW HEN 
Middle doubled: COOW HEEN 
Reversal: WOC NEH 
Middle missing: CW HN 
End letter COM HEC 
different: COP HEP 
Initial: MOW FEN 
different: DOW CEN 
Initial only CUM HUW 
Same: CUG HUK 
Jumbled: WCO NHE 
OCW ENH 


1. All transformations are intentionally non-words. 


TASKS 
MOUSE 


MOUOUSE 
ESUOM 
MSE 


MOUSN 
MOUST 


COUSE 
BOUSE 


MARIN 
MANTY 


OSMUE 
SUMEO 


Notes: 


DONKEY 
DONONKEY 
YEKNOD 
DOEY 


DONKEG 
DONKEW 


LONKEY 
SONKEY 


DISBUG 
DISCAR 


DEGNOK 
KNODEY 


ELEPHANT 
ELEPHAPHANT 
TNAHPELE 
ELANT 


ELEPHANK 
ELEPHANS 


OLEPHANT 
BLEPHANT 


EBOGFICK 
ECOSONIC 


PHENATLE 
PETHLANE 


2. The difference between the two words in the presence or absence of an ascender for the 
changed letter. The first word in each category has a form similar to the target, the second word the 


opposite. 
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RECALL FROM SINGLE VERSUS DUAL THEME TEXTS 


By J. M. BILL 
(Department of Education, The Queen’s University, Belfast) 


Summary. The capacity of 10-year-olds in coping with a mixed (double theme) text was 
examined. In a ‘‘distributed’’ version of the mixed text a story theme was used to carry an 
additional load of historical information which was evenly distributed throughout the 
narrative. In a second ‘‘consolidated’’ version of the mixed text the history material 
occurred en bloc midway through the story. Performance on the mixed texts was 
compared with that in a ‘‘separated’’ condition where the story and history information 
were presented in two separate texts. Recall of the history material was substantially 
better in the separated text condition than in either of the other two and recall of the story 
was poorer in the consolidated than in either of the other two conditions. The findings are 
interpreted in terms of the effects of attentional set and the extra processing demands 
incurred in disentangling two different themes in a mixed type text. 


INTRODUCTION 


CONCENTRATING Upon very simple narratives such as folk-tales, and drawing upon 
Bartlett's original notion of the text schema (Bartlett, 1932), theoretical analyses of 
text structure (Rumelhart, 1977; Van Dijk and Kintsch, 1977; Mandler and 
Johnson, 1977; Thorndyke, 1977; Stein and Glenn, 1979) have focused attention 
upon a search for the ''idealised" schema or macrostructure that may be held to 
characterise a well-formed story. All text-structure models assume that stories have 
a characteristic organisation (or deep structure) which specifies, and governs the 
relationships among, the story elements (such as setting, theme, plot and 
resolution). It is further assumed that the experienced reader of stories has available 
in permanent memory an internalised representation similar to the idealised story 
schema which serves as an organisational framework for selecting and storing the 
story information at the comprehension stage and which is further activated at the 
recall stage to guide retrieval of the information. 

ereas the notion of the schema has been a fruitful subject for text-processing 
research, schema-based theory has come under heavy criticism for its preoccupation 
with very simple and often contrived stories or, as is the case with script theory 
(Schank and Abelson, 1977), highly stereotypic event routines. The danger of over- 
generalising from such a restricted base to more natural texts is emphasised by Barr 
(1982) who cautions that '*. . . specially contrived narratives or ritualised activities 
such as eating in restaurants or attending weddings may yield a misleading sense 
about the extent to which schemata are readily identified and shared." Neisser 
(1982) also draws attention to the deficiencies in the ecological validity of story 
memory research, with the cryptic comment that **. . . hardly anyone memorises 
one-page stories except in the course of psychological experiments”. 


Apart from a few initial attempts (as in Meyer, 1977; Kintsch and Yarbrough, 
1982) to extend text-structure models to information — or expository-type texts, the 
study of how an individual learns from such materials has been left largely to the 
field of pedagogics and instructional theorists. This approach, though it may draw 
upon the levels of processing model (Craik and Lockhart, 1972; Cermak and Craik, 
1979) as a supportive framework, has concentrated upon the practical issue of how 
to induce the learner to engage in effective reading and study habits. It therefore 
tends to draw attention away from the text itself and focus it upon external 
manipulations such as the use of ''advance organisers’ (Ausubel, 1960), the 
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teaching of instructional goals (Rothkopf and Billington, 1975), or the effect of 
adjunct aids such as embedded questions (e.g., Anderson and Biddle, 1975; Marton 
and Saljo, 1976). As a further consequence of this more practical approach, research 
employing information-type texts is more likely to test recall by the conventional 
pedagogic device of a set of comprehension questions in contrast to text-structure 
research which customarily examines free recall protocols for the extent to which 
they match the pre-defined structure of the original text. 


As noted already, factors external to the text, such as adjunct aids, can be 
manipulated to exercise an influence on the type and level of processing engaged in 
by the reader. However, increasing attention is now being paid to internal 
characteristics of the reader in terms of his existing ‘‘world knowledge" and its 
linkage with task-relevant activities. Thus the study of Chiesi et al. (1979) has shown 
how the reader with specialised knowledge of the game of baseball is more able to 
employ qualitatively superior processing strategies, such as selective attention and a 
more efficient use of working memory, in following an account of a baseball game. 
Working in a developmental context, Chi (1978) has also vividly demonstrated the 
importance of considering the effect of the stimulus material and its interaction with 
the memoriser's knowledge base. In the Chi study a surprising reversal of the usual 
age-related effect in memory (for chess positions) was revealed in the particular 
situation where the younger subjects were the experts and the adults relative novices 
in knowledge of the game of chess. Even in the case of the very young child, the 
importance of the stimulus material, together with the context in which the act of 
remembering occurs, is emphasised in the work of Soviet psychologists such as that 
of Istomina (1975, originally published 1948) in which children as young as age 3 or 
4 performed much better in following the goal of remembering shopping-list items in 
a role-play context than in recalling word-lists in a quasi-experimental setting. As 
Naus (1982) points out, children's effective use of strategies (such as rehearsal and 
organisation) in response to a request for memorisation does not depend solely on 
the potential availability of these techniques in the repertoire nor on their 
metamemory judgments (Brown, 1975; Flavell and Wellman, 1977) regarding their 
implementation but rather the deployment of task-relevant strategies seems also to 
depend heavily on the characteristics of the to-be-remembered material. 


In moving from the rather discrete stimulus material used in such studies as 
those of Istomina (1975) and Naus (1982) to that of connected discourse, it seems 
equally likely that the reader's experience of particular text types (such as 
imaginative prose, information texts) will also play a major part in determining the 
type of processing activity employed. Whereas, as Kintsch (1982) states it, the type 
of text often ‘‘announces itself’, it nevertheless cannot be concluded that this will 
always be apparent to the less experienced reader. Indeed, the capacity for flexible 
deployment of appropriate mnemonic strategies becomes an even more crucial topic 
for research in the case of mixed type texts, examples of which might include 
biography, newspaper style reporting, humorous articles intending to convey à 
serious point and so forth. Yet despite Kintsch's (1982) reminder that ''pure" texts 
are rare in real life, the preoccupation of research with mono-thematic type texts 
continues. In the school setting, however, the use of mixed type texts is common, the 
assumption being that wrapping up ‘‘hard’’ information in biographical or narrative 
form is an effective pedagogical device for promoting learning of the ‘‘target’’ 
information. The study reported here is an initial attempt at testing this assumption 
by examining the capacity of primary school age children to cope with a mixed type 
text, in this case an historical narrative which uses a strong story line to convey a 
considerable extra load of historical information of a more didactic type, and to 
compare their performance on the mixed text with that in a number of alternative 
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conditions in which the two themes, story and historical information, are separately 
presented. 


METHOD 
Sample 
Six complete Primary 6 classes (average age 10 years), drawn from two all-girls 
schools, were used to form the six groups in the study. In all there were 147 pupils 
who were present at both sessions of the study. 


Materials 

The materials, specially written for the study, consisted of three versions of a 
text comprising an historical narrative together with specific historical information. 
The material to be studied was intended to be typical of the kind of substantial 
classroom learning task familiar to the sample age group. In each of the three 
versions, the combined story and history material consisted of 760 words with 
exactly half apportioned to the story and half to the history. 


The story, set in a period of rebellion against the king, followed the familiar 
theme of a hero (a stable boy) setting out on a dangerous mission (to apprise his 
master that the castle had fallen into the hand of rebels masquerading as minstrels), 
meeting and overcoming a frustratory block (stumbling upon a group of rebels but 
managing to escape) and eventually reaching his objective (his master’s camp). In 
the distributed version of the text, story and historical information were interspersed 
as іп the excerpt. . . . But the way they went was by a secret tunnel. This led them 
through the motte, the man-made mound on which the castle was built. By going 
this way they were able to avoid the minstrels. 


In the second version, designated consolidated, the historical information was 
introduced in one continuous passage half-way through the story at the point where 
the hero stumbles across the rebel band and overhears a conversation as in the 
following sample extracts. . . . This tall fellow was leaning against a mangonel the 
giant machine for hurling rocks. . . . AS Mex listened he heard. . . . “But what can 
the French king do?” asked a rebel. ''Well his plan is to attack Henry in France and 
that will delay his return to fight the rebellion here”, came the answer. At the end of 
the passage conveying the historical information, the story was resumed . . . Now 
that Mex knew who these men were, he turned to slip away and just as he thought 
he'd made good his escape... . 


In the separated text version, the history and story materials were detached 
from each other and presented as two separate texts. In this version, the story text 
followed exactly that of the other two versions. The history text was in 
straightforward expository form although couched in a fairly informal mode 
suitable for 10-year-old readers, thus: Did you know that for very many years 
England had been ruled by the Saxons? Then along came the Normans and they 
conquered the Saxons. . . . 


Due to the substantial length and fairly condensed nature of the material it was 
necessary to set a rather large number of questions as a comprehensive test of 
acquisition. A total of 34 questions were asked, 17 examining acquisition of story 
content and 17 history content. In each case there were 10 questions of an abstract 
type and seven of a factual nature. In order to reduce the strain of answering 34 
questions, the items were posed in multiple-choice form. Exactly the same questions 
were set for each of the groups but their order was adjusted in each condition to 
follow the order of the content to which they were related in the particular version of 
text experienced. As explained below a further control for possible order effects was 
found necessary in the separated condition. 
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Design and procedures 

The study was designed to answer the question whether a pure type text, as in 
the separated condition, led to superior recall than in the case of mixed type texts 
such as those in the distributed or consolidated conditions. However, in order to test 
this proposition, it was necessary to control for a possible order effect in the 
separated condition where the possibility arose that experiencing, for example, the 
history material first at either the study or recall stage could influence results in 
favour of the history material. Four classroom groups were thus required to 
counterbalance the order of experiencing the different materials in the separated 
condition. The first of these groups studied the history text first and then the story 
and took the history questions first followed by the story questions. A second group 
first studied the story, followed by the history text and experienced the same order 
of question content at recall. A third group studied history first and story second 
and recalled story material first followed by history material. The fourth group 
began with the story then studied the history text and experienced the reverse of this 
order of content at recall. 


Trial testing of the texts indicated 10 minutes as a suitable study period. In the 
separated condition five minutes of study was allocated to the first passage (whether 
history or story) followed immediately by a further five minutes of study of the 
second passage. Before study commenced pupils were warned of a subsequent recall 
test which would cover all of the content of the material studied. Pupils responded 
to each question of the recall test in turn after the question and its four alternative 
responses were read aloud to them. Three days later an unannounced recall test, 
identical to the first, was administered, after which pupils completed the Mill Hill 
Vocabulary Scale (Raven, 1958) employed here as a check upon the equivalence of 
the six groups in verbal ability. 


RESULTS 
The first analysis undertaken was a check upon the equivalence of the six class 
groups on the Mill Hill Vocabulary test. The class means are reported in Table 1. 
Analysis of variance (F(5,146) = 0-08, P = 0-91) confirmed that the groups did not 
differ in verbal ability as measured by the Mill Hill test. 





TABLE 1 
VOCABULARY SCORES 
Groups 1 2 3 4 5 6 Р 
Mean 13-50 13-54 13-83 13-46 13:52 13-54 NS 
N 22 22 24 24 29 26 


As explained above, in order to control for possible order effects at the study 
and recall stages, the separated text condition was split into four groups (classes 3-6) 
in which study of the history and story passages (whether experienced first or 
second) was followed at the recall stage either by maintaining or reversing the order 
of passage content of the study stage. Analysis of variance of immediate recall 
scores of the history content (F(3,99) = 0-19, P = 0-91) revealed non-significant 
differences among the four different order groups as did a similar analysis for the 
story recall (F(3,99) = 1-12, P = 0-35). Delayed recall of history (F(3,99) = 0-04, 
Р = 0-99) and of story material (F(3,99) = 1-76, Р = 0-16) also produced non- 
significant differences. On the basis of these analyses it was concluded that order of 
presentation and recall had no discernible effect upon memorisation of the 
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separated texts. For convenience of presentation, in the analysis which follows, the 
four separated text groups are combined into a single ‘‘separated’’ condition. 





TABLE 2 
MEAN RECALL SCORES (PER CENT IN BRACKETS) 
Distributed Consolidated Separated P 
Immediate: 
History 9-05 (53) 10-27 (60) 12:40 (73) 0-001 
Story 13-00 (77) 11.73 (69) 13.77 (81) 0.003 
Delayed: 
History 9-00 (53) 9-86 (58) 12:37 (73) 0-001 
Story 12:46 (73) 11-32 (67) 12-74 (73) 0-03 


Examination of the immediate recall scores of the history information in Table 
2 indicates that the three different text types led to varying levels of recall and 
analysis of variance (F(2,144) = 15-10, P < 0-001) confirms this variation. Further 
examination of the differences among means, using the Newman-Keuls range test 
(set at P — 0-05) revealed that the separated text condition produced a higher mean 
score than that in either of the other two conditions. The difference between the 
distributed and consolidated conditions did not, however, reach statistical 
significance. In the case of immediate recall of story information the three mean 
recall scores also varied significantly (F(2,144) = 6-066, P = 0-003) although to a 
lesser extent than was the case with the history material. Newman-Keuls range 
testing of the story means showed that the consolidated text condition produced a 
significantly lower mean score than in either of the other two conditions which did 
not differ significantly from each other. The pattern of results emerging in the 
immediate recall scores was sustained at delayed testing. History scores varied 
significantly (F(2,144) — 17:58, P — 0-001) with the separated condition again 
proving superior to both consolidated and distributed conditions but with the latter 
two conditions not differing significantly. Story recall scores also varied 
significantly (F(2,144) = 3-52, P = 0-03) with the consolidated condition being 
lowest but with non-significant differences emerging between the separated and 
distributed conditions. 


The substantial and relatively enduring nature of recall differences among the 
three versions of text is measured more clearly in terms of percentage (rounded) 
correct recall. Thus, taking the most extreme differences, history recall (both 
immediate and delayed) is 20 per cent poorer in the distributed than in the separated 
condition. Story recall is less affected by the version of text experienced. 
Nevertheless, immediate recall of story in the consolidated condition is 12 per cent 
below that of the separated condition though reducing to 6 per cent between the 
latter and the other two conditions by the delayed recall stage. As a summary of 
results, it can be stated that story recall from the separated version of the texts was 
as good, or better, than that from the mixed versions and history recall was 
markedly superior. 


DISCUSSION 
The finding of the study is that, among the 10-year-olds of the sample, 
wrapping up information in a story form is a relatively less successful means of 
communication than is the delivery of story and information material in the form of 
separate texts. 
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In attempting to interpret the results of the study it is helpful to recall the 
emphasis placed by developmentalists (e.g., Naus, 1982) on the characteristics of the 
stimulus materials and their consequential effect upon the type of processing activity 
that is engaged. In the case of text learning also, as Kintsch (1982) has observed, the 
passage may clearly ‘‘announce itself" to be of a certain type (a story, an 
information text, a humorous article, and so on) and thus initiate a particular type 
of processing (such as following the plot, searching for and storing information, 
sharing in the author's sense of humour). Nevertheless, as Schallert (1982) has 
clearly demonstrated with passages of a single (but ambiguous) theme, variations 
among readers in the particular knowledge base they bring to the text may lead to 
quite different interpretations of the meaning of the passage and the intentions of 
the author. 


Even more so in the case of a mixed type text, such as that employing a story 
theme to carry with it additional more didactic type information, there is the danger, 
particularly in the case of the less sophisticated reader, that the passage may 
misannounce itself as simply a ‘‘story type’’ text leading to a concentration upon 
following the narrative aspect of the passage to the consequent neglect of the 
information elements. Whereas it is undoubtedly the case that even the 
comparatively young reader may have an understanding of the different goals of 
reading (Meyers and Paris, 1978) nonetheless there may be an incomplete awareness 
that a single passage may contain a number of themes each of which may require the 
reader's attention. Even where special instructions are given at the outset (such as in 
the present study to ‘“‘try to remember all of the material in the passage") there may 
be a failure to comply (Rothkopf and Billington, 1975) or there may be a gap 
between metamemory and memory behaviour (Flavell and Wellman, 1977) in 
sustaining the intention to do so right throughout the study period. An additional 
consideration, of particular importance in the case of a ‘‘story-plus-information’’ 
text, is that the reader's ‘‘sense of story’’ may begin to take control of attention and 
the narrative material, owing to its likely greater intrinsic interest, may attract more 
of the available processing resources to the relative disadvantage of the more 
didactic, information element of the passage. Following this line of reasoning, it 
may be inferred that attentional bias, disfavouring the history material, was a major 
cause of the significantly poorer recall of that type of material from the mixed texts 
in comparison with that from the separated history text version. 


If, however, it is assumed that in the three learning conditions an equal amount 
of cognitive effort was generated, it follows that the diminished level of attention to 
the history material within the mixed text conditions should have been compensated 
by an augmented and therefore higher level of attentional effort to the story material 
than was the case in the separated version. Thus, on this basis, story recall from the 
mixed texts ought to have been better than that from the separated text, yet this was 
not the case. An alternative explanation to that of selective attention (due to initial 
set or to greater interest generated by the story) and which could be used to explain 
recall differences among the three conditions in both story and history content 
would seek to account for these differences in terms of the extra-task demands 
imposed on the reader by the requirement in the mixed texts to disentangle and 
separately collate the material of the two intermingled themes. Such extra-task 
demands would, it can be surmised, act to reduce the total available resources in the 
mixed text conditions and thus depress performance in both history and story recall. 
It can be further surmised that the potential availability of an appropriate structural 
schema for the story would offer a degree of protection against the interruptive 
effects of intrusive heterodox elements upon story comprehension and that the lack 
of such a protective framework in the case of the less easily organised history 
elements caused recall of the latter to suffer more as a consequence. Following still 
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further the particular line adopted here the poorer story recall in the consolidated 
condition may be attributed to the particularly disruptive effect upon the reader of 
having the narrative thread of the story broken midway through the text by the 
intrusion of the solid block of history material which itself comprised half of the 
total passage length. 


In the discussion two possible influences, attentional set and resource depletion 
arising from extra-task demands, have been put forward as potential contributors to 
recall differences among the three versions of the text. It is probably more 
reasonable, however, to envisage these two, not as rival explanations but rather as 
working simultaneously to influence recall performance from the mixed type text. 
Thus one can conceive of performance on the history material in the mixed texts 
suffering both from the effect of attentional bias and from a further depletion due 
to the drain on overall attentional resources caused by the extra demands of mixed 
text processing in identifying and collating the specific history material embedded in 
the passage. In the case of the story material, however, the favouring of this theme 
in terms of processing resources is countered by the extra-task demands of the mixed 
text so that in the distributed version the net amount of available resources is no 
higher than that in the separated story text (and hence the non-significant difference 
in story recall performance between the two conditions). In the consolidated 
version, however, as argued previously, the particular disruption to the story theme 
is that text type drew off even more of the available resources for comprehending 
and storing the narrative material and thus led to a net allocation which fell below 
that in either of the other two conditions (thus leading to the significantly poorer 
story recall in the consolidated condition). 


Devices to attract the reader’s attention to didactic type information by 
packaging the material in the more beguiling form of story or biography seem 
increasingly popular in instructional settings. Extensions to this approach appear in 
other less formal learning settings as in radio and television documentaries which 
dramatise current affairs (so-called ‘‘faction’’). From a theoretical perspective it 
would be of interest to investigate further how metamnemonic understanding of the 
nature and purposes of more complex texts (and other more indirect methods of 
conveying information) develops and with it the capacity for handling the problems 
of identifying and pursuing more than a single goal of remembering. A clearer 
understanding of these issues would help in the designing of more effective versions 
of multi-purpose texts and instructional programmes and assist in the devising of 
methods of encouraging the learner to develop and sustain appropriate learning 
strategies for such tasks. 





Requests for reprints should be addressed to Dr. J. M. Bill, Department of Education, 
Queen's University of Belfast, Belfast, ВТ? INN, N. Ireland. 
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EFFECTS OF TEXT ILLUSTRATION ON CHILDREN’S 
LEARNING OF A SCHOOL SCIENCE TOPIC 


By D. J. REID anp M. BEVERIDGE 
(Department of Education, University of Manchester) 


Summary. The effect of text and illustration on the learning of a school science topic by 
13-year-old children was investigated. 272 children studying integrated science in the 
second year of two comprehensive schools were given texts with varied picture content. 
Learning was measured by a criterion-referenced objective items test which differentiated 
between the effects of pictures on their own, text on its own, the general effect of pictures 
when added to text and the specific effect of pictures associated redundantly with parts of 
the text. The results indicate that there is no general motivational effect of pictures on the 
learning of text, but that with higher ability levels the effect of specific pictures is 
beneficial whilst with less able children they distract. In addition, there is some evidence 
to indicate that when materials are presented in traditional worksheet mode they might be 
learned more efficiently than the same materials presented in microcomputer mode. 


INTRODUCTION 


ReSEARCH currently indicates that under some conditions pictures can enhance recall 
of information from text (e.g., Rusted and M. Coltheart, 1979), but that this 
facilitatory effect does not occur consistently (Levie and Lentz, 1982). The research 
programme, of which the work reported here is a part, is aimed at specifying more 
precisely those conditions under which pictures facilitate text processing. One focus 
in this study is on the relationship between the ''difficulty" of the text and the 
capacity of the pictures to assist learners. There are two dimensions to the 
“difficulty” of text. Firstly there is the objective difficulty which is a feature of the 
text itself. At the syntactic and lexical level, this is typically measured by readability 
formulae such as the Fry, Flesch and SMOG procedures (Reid, 1984a). However, no 
text will prove equally demanding for all children. The more able and educationally 
advanced will find it easier to extract appropriate information (Vernon, 1953). The 
second dimension to the difficulty of text lies outside of the text itself and is a 
feature of the reader. The absence of any attempt to incorporate measures of 
children's ability may well account for the equivocal evidence on the effect of text 
difficulty and pictures on learning (Reid et a/., 1983). The present study is concerned 
to look at the relationship between picture facilitation, children's ability and text 
difficulty. 


The method responds to Donald's (1979) plea for research in this area to use a 
picture-prose paradigm rather than the paired-associate method by which the picture 
superiority effect is more consistently found (Nelson et al., 1976). And like Willows 
(1979) we present children with materials in which some information is available 
both in pictures and text. But we also include information available only in pictures 
or text. This technique allows assessment of both a general motivational influence of 
pictures and their effect on the retention of specific items of information. 


The present study also investigates the effect of pictures in two different modes 
of presentation. We are particularly interested in the comparability of 
microcomputer and worksheet presentations because the microcomputer can store, 
and also control if required, information about sequence and length of time of text 
and picture presentations. Comparison of the two modes of presentation has, 
therefore, implications for both educational software and research into picture-text 
processing. 
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METHOD 

Sample 

272 13- to 14-year-old children from two comprehensive schools in inner city 
Manchester participated in the study. All were following the same integrated science 
syllabus (ILEA, 1978), and were in their second year of secondary education. Both 
schools served large but different housing estates, and organised their junior science 
in similarly sized mixed ability classes. None of the children had been previously 
taught the topic under study. 


Materials 

The biological topic ‘‘Grass Consumption" was written at two levels of 
readability as measured by the Fry, Flesch and Smog formulae. The ‘‘easy’’ form 
contained a total of 447 words and gave readability levels of 11, 11, and 13-3 respec- 
tively. The ‘‘difficult’’ form contained 485 words and gave readability levels of 17, 
15-16 and 18 respectively. Both written forms of the topic contained an identical 
amount of information, and each form was presented with identical pictures in the 
same positions relative to the text. The five pictures were (a) a Venn diagram showing 
examples of the foods of omnivores, carnivores and herbivores; (b) a high power 
section of several cells from the leaf of grass to show the cellulose cell wall and silica 
crystals in the cytoplasm; (c) an outline diagram of the skull of a sheep showing the 
position of the teeth; (d) a longitudinal section through a cheek tooth of a sheep; and 
(e) the mandibles of a locust, showing their cutting and grinding edges. 


The topic was presented in one of four forms, easy readability with pictures as 
described, easy readability with no pictures, difficult readability with pictures and 
difficult readability without pictures. Each of these forms was produced both in 
traditional school worksheet format and also on a BBC 32K microcomputer. This 
latter mode of presentation was interactive in that a child could read a screen of text, 
and then call up the corresponding picture at will by pressing the appropriate key. 
Once the subject had worked his way through the program, a menu was presented 
which allowed him to return to any screen, text or picture for as long and as 
frequently as he wished. 


Assessment instrument 

The extent to which the children extracted and retained information from the 
various topic forms was assessed by a specially developed objective items test. 
Criterion-referenced (Black and Dockrell, 1984), it contained 30 questions covering 
the domain. It was validated by three таа judges for item content, and 
piloted for before and after scores using the ø coefficient (Cohen and Holliday, 
1982). As a result, 27 of the original 30 questions were selected for use, nine testing 
the pupils’ learning of information presented only in picture form, a further nine 
questions which tested for the learning of information only available in text, and 
nine further questions which tested for the learning of information which was 
presented both in picture form and in the text. Table 1 gives two examples of each 
type of test item and where appropriate the corresponding texts for easy and 
difficult forms. 

In terms of picture availability, the three question types in the instrument 
described in Table 1 allow learning under six conditions of picture and text to be 
measured. These are summarised in Table 2 as cells A, B, C, D, E and F. 


Procedure 

The authors have shown elsewhere (Reid ef al., 1986) that children’s 
performance in school science is positively associated with the skills they show in 
perceiving information in biological pictures. The present study takes this into 


296 Effects of Text Illustration 


TABLE 1 
EXAMPLES OF TEST ITEMS AND CORRESPONDING TEXTS FOR EaSy AND DIFFICULT READABILITY FORMS 


Relevant quotation from text 


Objective item 


Easy Readability 


Difficult Readability 








* * * Information available from pictures only * * * 


The cutting edges of the locust 

mouthparts are found 

А on the front of the 

mandible 

on the top of the mandible 

at the back of the 

mandible 

D on the fleshy part of the 
mandible 


суо 


In grass, silica crystals аге 
found 

A in the cell walls 

B on the cell surface 

C im the cell cytoplasm 
D in the cell cellulose 


no counterpart in the text of either readibility form 


no counterpart ın the text of either readibihty form 


* * * [nformation available from the text only * * * 


The ways in which animals are 
adapted for feeding depends 
mainly upon 

where they look for food 
the type of food they eat 
the amount of food they 


OU» 


eat 
D the length of time spent 
looking for food 


The mandibles are protected 
from being worn away by 
grass because of the 

A position of the mandibles 
on the head 

presence of enamel 
presence of special protein, 
chitin 

action of the other mouth- 
parts 


о aw 


“They are adapted or well- 
suited to this in many ways, 
depending on the type of food 
they eat". 


“Оп the ridges this cuticle is 
made hard by a special 
protein called chitin which 
stops the ridges being worn 
away”. 


“Animals may be regarded as 
adapted or well-suited to the 
function of eating depending 
роп the type of food they 
eat". 


** A layer of cuticle covers the 
insect's whole body, and on 
the ridges this substance is 
hardened by a special protein 
chitin, preventing wearing 
away of the mandible”. 


* * * information available from both text and picture * * * 


The sharp incisor teeth in 
sheep are found at the 

A back of the lower jaw 
B front of the lower jaw 
C back of the upper jaw 
D front of the upper jaw 


Animals that eat only plants 
are called 

А carnivores 

B parasites 

C omnivores 

D herbivores 


“At the front of the lower 
Jaw of the sheep are sharp 
incisor teeth’’. 


“Sheep and locusts eat just 


plants and are called 
herbivores’. 





“The sheep’s sharp incisor 
teeth are situated at the front 
of the lower jaw”. 


**Sheep and locusts consuming 
only vegetable matter are 
called herbivores”. 


Note: Recognition phrases are italicised here, but were not italicised in the original test or text. 
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TABLE 2 
TREATMENT CELLS 









Objective items test questions, 
































measuring information available in absent 
pictures only B 
(P) (P+) (Р-) 
text only D 
(T) (T+) (T-) 
both pictures E F 
and text (P+T+) (P+T—) 






(Р+7) 





КЕҮ: 

Cell А measures the effect of pictures only on children’s learning. The information required to answer 
these questions is unavailable in the text. 

Cell В acts as a control to cell A. Questions in this treatment should be largely unanswerable since the 
text contains no pictures. 

CellC measures the general effect of pictures upon the retention of information available only in the 
text. 

Cell О acts as a control to cell C. 

Cell Е measures the effect of pictures on specific text content. The information required by these 
questions is present in two forms, once in the picture and again in the text. 

Cell F acts as a control for cell E. 


account by using science ability as one of the independent variables. The children 
from each school were ranked for science ability on the basis of a common within- 
school examination, and divided into four ability groups. These were designated 
“‘superior’’, **above average", ‘‘below average" and ‘‘inferior’’. The four forms of 
the topic (easy with pictures, easy without pictures, difficult with and without 
pictures) written in paper mode were distributed to 160 pupils in school A equally by 
ability. The learning materials, instructions to pupils, the test instrument and the 
research setting are essentially identical to those found in normal classroom learning 
situations. 


112 children from school B attended the University Department of Education 
computer laboratory in groups of 10. These children were also randomly allocated 
to experimental conditions subject to the constraint that equal numbers of each 
ability range received each of the four forms of presentation. For each mode of 
presentation the children were allowed as much time as they required to complete the 
exercise, which was never longer than one hour. 


Pilot studies with the computer mode had indicated that the children needed 
time to learn the user-friendly instructions provided by the computer, informing 
them of how to change the display. The children receiving the microcomputer 
presentation were first given a specially written trial program on the topic of the 
solar system which operated for 15 minutes. The instructions on the screen were 
identical to those they met later in the **Grass Consumption” topic, and, after a few 
minor difficulties the children were able to operate the program. They then moved 
on to the test materials. 


During their self-paced operation of the test program the children were asked to 
complete a cloze test on the topic. This device had proved useful in previous work as 
a means of focusing children's attention on the information presented. Upon 
completion of the exercise to the satisfaction of the individual child, the objective 
items test was distributed and answered. Three months later, upon the return of the 
children to school after the summer holidays, each child repeated the test. 
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RESULTS 


The data are analysed in two main blocks, one block presenting the paper mode 
results (from school A) and the other the computer mode results (school B). Initial 
data inspection indicated a very similar pattern and level of scores for the two 


TABLE 3 


MEANS AND STANDARD DEVIATIONS OF SCORES MADE BY CHILDREN ON PAPER MODE (CELLS CORRESPOND 
TO THOSE IN TABLE 2) 





























Question Type Ability 
mean sd mean sd 

Pictures 

only superior 
above average 
below average 
inferior 

Text 

only superior 


above average 
below average 
inferior 







superior 
above average 
below average 
inferior 


text 






TABLE 4 


MEANS AND STANDARD DEVIATIONS OF SCORES MADE BY CHILDREN ON COMPUTER MODE (CELLS 
CORRESPOND TO THOSE IN TABLE 2) 

























Question Type Ability 





Pictures 
only superior 

above average 
below average 


inferior 





superior 
above average 
below average 





inferior 
Pictures 
plus superior 
text above average 


below average 
inferior 
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FIGURE 1 


GRAPHS SHOWING CHANGES IN LEARNING FOR PAPER AND COMPUTER MODES IN THE ABSENCE (-) AND 
PRESENCE (+) OF PICTURES 


PAPER MODE COMPUTER MODE 


(i) (iv) Pictures only 





(v) Text only 


Score —————— » 








7 7 
(iil) (vi 
S & АА ) S 
6 
š Pictures plus text 
BA 
x AA 
4 | 
3 I 
2 
= + 
S = superior ability BA = below average ability 
АА = above average ability I = inferior ability 
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readability conditions (easy and difficult); consequently the decision was made to 
abandon this variable at the pre-analysis stage. 


A series of two-way analyses of variance (ability x presence or absence of 
pictures) were performed separately for the two modes of presentation and the three 
question types. Both these variables are fixed, and the dependent variable is the 
score made by each child on the various parts of the objective items test. Analysis of 
differences between means when the F-test leads to rejection of the null hypothesis is 
by the conservative Tukey two-tailed t-test (Edwards, 1954). 


The means and standard deviations of the scores made by the children working 
in paper mode are given in Table 3, and those in the computer mode in Table 4. 
Figure 1 (i), (i3), and (iii) shows the paper mode data in graph form, and Figure 1 
(iv), (v) and (vi) shows the corresponding graphs for the computer mode data. 


In all six of these two-way ANOVAs, the ability variable is significant as a 
main effect at the P « 0-01 level, the F values ranging from 8:08 to 19-375 (df = 
3,152). Picture appears as a significant main effect once, in paper mode and for 
picture-only questions (F = 14:925; df = 1,152; P < 0-01), (see Figure 1 (i)). A 
significant interaction between ability and the presence of pictures also occurs once 
only, again in the paper mode and for the picture-plus-text condition (F = 5-18; df 
= 3,152; P < 0:01), (see Figure 1 (iii). All other variance ratios fail to reach 
significance at the P « 0-05 level. 


A second set of analyses was performed on data collected from the children 
three months after the original learning experience, the children having been asked 
to repeat the criterion-referenced objective items test after their summer holiday. It 
is not surprising that overall a marked decay had taken place (13:168 per cent; P < 
0-01). There is no evidence, however, that the presence or absence of pictures in the 
original learning programme had any effect on long term retention. 


DISCUSSION 
It is the results obtained by the children using the paper (worksheet) mode of 
presentation which focus most clearly on the main concern of this study, the effects 
of illustration on children's learning of a science topic. 


The paper presentation 

The mean of the scores obtained by the children (N = 160) learning from this 
mode are given in Table 3, and it is this table which permits an interpretation of the 
effects of pictures on children's learning in terms of the model suggested in Table 2. 
Tables 3 and 4 are replications of Table 2 and include the cell lettering of Table 2 for 
ease of reference. 


Effect of pictures only on learning (cells A and B) 

It is an a priori expectation that information available in the pictures but 
unavailable in the text should lead to significantly improved learning under the 
conditions pertaining in cell A compared with those in cell B. The F test indicates 
that this is so (P « 0:01). Reference to Table 3 shows that the P — condition means 
(cell B) are consistently lower at all ability levels than the corresponding P+ means 
(cell A), although subsequent t-testing indicates that most of this variance is 
associated with the superior ability group. 


General effect of pictures on text (cells C and D) 
A comparison between these two cells indicates the general effect of pictures 
upon the children's retention of information in the text itself. In cell C the 
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information to be learned is available only in the text, but the material is reinforced 
with pictures. These pictures are germane to the general context but of themselves 
contain no information directly relevant to the answering of these particular 
questions. In cell D, the questions asked are identical to those in cell C, but since the 
text has no pictures added there can be no general picture effect. Cell D therefore 
acts as control to cell C. The overall means between cells C and D fail to show any 
significant differences, implying a lack of a general facilitating effect of pictures on 
children’s ability to retain information available in text, an effect which is constant 
across the range of ability levels. There have been few attempts in the literature to 
quantify a motivating effect of picture upon text retention, but such as there have 
been (e.g., Bock, 1983) produce similar conclusions as this study. 


Specific effect of pictures on text (cells E and F) 

In cell E the children are receiving information from the text which is reinforced 
by that information being repeated in the pictures. In cell F the same questions are 
put, but the children have had no access to the pictures; as in the above condition, 
cell F acts directly as a control to cell E. There is no overall effect of picture in this 
condition (Е = 0:217; df = 1,152; NS), but there is a significant interaction between 
ability and picture (F = 5-18; df = 3; P < 0-01). Pictures appear to enhance the 
learning of the two superior ability groups but to retard the learning of the two 
inferior ability groups (Figure 1 (iii)). By subtracting the mean score of the children 
reading text alone from that of those children reading specifically illustrated text, 
and dividing this figure by the standard deviation of the text alone group, a measure 
of the effect size of pictures upon learning is obtained. This data is extracted from 
Table 3. For cells E and F the increase for the top two ability groups is 
6:45-5-45/1:568, = 0:638, representing an improvement of nearly two-thirds of a 
standard deviation, or 18:3 in percentage terms. This is very close to the kind of 
result reported by Levie and Lentz in their meta-analysis of American literature 
(1982). Clearly such differences are of educational importance as well as of 
statistical significance. The reverse effect, that of pictures acting as distractors to 
less able pupils is also educationally important. Here the less able pupils working 
without pictures retained about 19 per cent (100 x 0-775/4-125) more information 
than the corresponding group supplied with pictures. If the less able groups find the 
text difficult to comprehend, they are not only unable to make up for this lack of 
comprehension by reference to the pictures, but the presence of the pictures actually 
reduces their performance with the text. This result suggests that the equivocal 
evidence for a picture facilitating effect (Reid, 1984b) could be accounted for in part 
by the failure of researches to take the ability of the children into account. 


The computer presentation 

As indicated earlier, the research programme will be using materials presented 
on microcomputer in order to examine the strategies employed by children learning 
science through picture enriched text. In an attempt to investigate whether pictures 
have the same effect on children's learning when presented via microcomputers as 
they do when presented in the more typical worksheet form, 112 children in school B 
were presented with identical materials to those in school A except that they were 
written in computer mode. The general pattern of results is remarkable in its 
similarity to those of the paper presentation (see Table 4 and Figure 1). There is a 
general upwards trend in the scoring at all ability levels in the picture-only condition 
in the presence of pictures, although unlike the paper presentation it is not 
statistically significant (F — 2-765; df — 1,104; NS). As in the paper presentation, 
there is a lack of any general motivating effect of pictures upon the learning of text 
(F = 1-051; df = 1,104; NS). Again, as in the paper presentation, there is an 
improvement in the Scores of the two superior ability groups in the presence of 
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specific pictures, and a worsening of the scores by the least able groups in the 
pictures-plus-text condition. The overall interaction between ability and pictures 
fails to reach significance however (Е = 1-089; df = 3,104; NS). 


The overall mean score made by children working in the traditional paper mode 
is significantly higher than that made by the children working in the computer mode 
(13:238 and 12-098 respectively; t = 2:212; df = 270; P < 0-025). A consideration 
of reasons for this would be mere speculation, because of the possible confounding 
effect of school upon mode of presentation. However, it is important to bear in 
mind that in spite of the similarity in the pattern of results between the two modes, 
the computer may be exerting a depressive effect on children's learning. This has 
potentially important general ramifications in the school learning context. 


Finally, it is worth noting the failure of the readability variable to influence 
performance differentially in the presence or absence of pictures. This seems 
peculiar since ability, which we have been able to show to have a differential effect, 
should influence how ‘‘readable’’ the texts actually are for any individual child. We 
are forced to conclude that differences in the level of difficulty at which the texts are 
written should be indicated by differential learning in the presence of pictures. 
Evidence for this remains elusive when current measures of readability are used as 
indicators of text difficulty (Perera, 1980; Reid, 1984c). 


The results reinforce previous findings that for non-narrative, expository 
reading materials typical of school science (Davies and Greene, 1984), a picture-text 
interaction appears to take place. The most interesting finding from this study is that 
one of the conditions which controls the form which this interaction takes is the 
science ability of the pupils themselves. The present authors are not aware that this 
has been demonstrated for narrative, non-expository texts of the type more usually 
associated with researches in the field. 


Requests for reprints should be addressed to Mr D. J. Reid, Department of Education, 
University of Manchester, Oxford Road, Manchester, M13 9PL. 


REFERENCES 


ВіАСК, Н, D., and DocknELL, №. B. (1984). Criterion Referenced Assessment in the Classroom. 
Edinburgh: The Scottish Council for Research in Education. 

Bock, M. (1983). The influence of pictures on the processing of texts: reading time, intelligibility, recall, 
aesthetic effect, need for re-reading. In Кіскнетт, G., and Воск, M. (Eds ), Psychlinguistic Studies 
in Language Processing. Berlin: de Gruyter. 

COHEN, L., and Houuipay, M. (1982) Statistics for Social Scientists. London: Harper and Row 

Davies, F., and GREENE, T. (1984). Reading for Learning in the Sciences. Edinburgh: Oliver and Boyd. 

DoNALD, D. R. (1979). Effects of illustration on early oral reading accuracy, strategies and com- 
prehension. Br. J. educ. Psychol., 49, 282-289. 

EDWARDS, A. L. (1954). Statistical methods for the Behavioral Sciences. New York: Holt, Rinehart and 
Winston. 

ІРА (1978). Insight to Science. London: Addison-Wesley. 

Leve, W. H., and Lentz, R. (1982). Effects of text illustrations: a review of research. Educ Comm. 
Technol., 30, 195-232. 

NELSON, D. L., REED, V. S., and WALLING, J. R. (1976). Pictorial superiority effect. J. exp Psychol. 
Human Learning and Memory, 2, 523-528. 

PERERA, K. (1980). The assessment of linguistic difficulty in reading material. Educ. Rev., 32, 151-161. 

REID, D. J. (1984a). The picture superiority effect and biological education. J. Biol. Educ., 18, 29-36. 

Кар, D. J. (1984b). A three-in-one readability program for science worksheets. School Sci Rev., 65, 
560-569. 

Кер, D. J. (1984c). Readability and science worksheets in secondary schools. Res. im Science and 
Technological Educ., 2, 153-165. 

REID, D. J., BRiGGS, N., and BEVERIDGE, M. (1983). The effect of picture upon the readability of a school 
science topic. Br. J. educ. Psychol., 53, 327-335. 

Кыр, D. J. BEVERIDGE, M. and WAKEFIELD, P. (1986). The effect of ability, colour and form on 
children's perceptions of biological pictures. Educ. Psychol., 6, 9-18. 


D. J. REID AND M. BEVERIDGE 303 


RUSTED, J., and COLTHEART, М. (1979). Facilitation of children’s prose recall by the presence of pictures. 
Memory and Cognition, 7, 354-359. 

VERNON, M. D. (1953). The value of pictorial illustration. Br. J. educ. Psychol., 23, 180-187. 

WiLLOows, D. M. (1979). Reading comprehension of illustrated and non-illustrated aspects of text. Paper 
presented at the annual meeting of the American Educational Research Association, San Francisco. 


(Manuscript received 19th June, 1985) 


Br. J. educ. Psychol., 56, 304-308, 1986 


INSPECTION TIME AND PRIMARY ABILITIES 


By COLIN COOPER, PAUL KLINE AND LISA MACLAURIN-JONES 
(Department of Psychology, University of Exeter) 


Summary. A measure of inspection time and measures of primary abilities drawn from 

the Comprehensive Ability Battery (Hakstian and Cattell, 1976) were administered to 20 

subjects, mostly undergraduates. Inspection time was shown to be related more to factors 

of perceptual speed and visualisation than to the primaries most associated with fluid or 

crystallised ability. 

INTRODUCTION 

INDIVIDUAL differences in the speed of certain perceptual discrimination tasks have 
been shown to be correlated with intelligence test scores (e.g., Brand, 1980; 
Nettelbeck and Lally, 1976). Recent research, performed with samples not yielding 
an extreme range of IQ scores, has generally shown such correlations to be insuffi- 
ciently large to enable inspection time (I. T.) estimates to be considered as substitutes 
for tests of general ability (e.g., Nettelbeck and Kirkby, 1983; Irwin, 1984). 
However, most studies (e.g., Irwin, 1984) report a correlation in the order of 0-3 
between inspection time, measured in several modalities by several psychophysical 
techniques, and measures of general ability. This robust finding cannot be ignored 
by those interested in exploring the processes underlying intelligent behaviour, and it 
is of major theoretical interest because of its link with Eysenck's (1982) theory of 
intelligence. 


Most studies investigating the overlap between I.T. and abilities have been 
based on one or two fairly crude tests of ability — Raven's matrices, a Mill Hill 
vocabulary scale, or the like. Nowhere has there been any attempt to link I.T. to the 
main primary ability factors. This is an unfortunate omission, for it may be the case 
that the correlations between I.T. and crystallised or fluid intelligence (Cattell, 1971) 
are attributable to a small number of primary abilities which load factors of fluid 
and crystallised ability at the second or third order. Furthermore, if the primary 
abilities which are most highly correlated with І.Т. are known, it becomes easier to 
select cognitive tasks which may show overlap with I. T. measures in order to specify 
I.T. in terms of a small number of known elementary cognitive processes. For 
example, if primary abilities such as associative memory and memory span were 
found to correlate highly with I.T. one might first examine one's procedure for 
estimating I.T. for possible artefacts (subjects confusing the response buttons, for 
example). If none were found one might select cognitive tasks such as memory 
scanning (Sternberg, 1969) thought to bear on these primaries, and examine their 
relation to I. T. With knowledge of only the correlation between I.T. and a higher 
order factor such as fluid or crystallised ability, there is no rational basis for 
selecting these cognitive tasks rather than any others. 


The present study therefore determines the magnitude of the correlations 
between one measure of inspection time and 17 scales each measuring one of the 
more common primary ability factors. This will also reveal whether I.T. reflects 
fluid ability (g,), crystallised ability (g) ог one or more of the other secondaries 
which are loaded by the primary abilities included here (Hakstian and Cattell, 1978). 


METHOD 
Subjects 


Ten females and 10 males, aged between 19-40 years (median 26) volunteered to 
take part in this study. All but two were undergraduates at Exeter University, the 
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others being educated to GCE A-Level standard. All subjects had previously 
volunteered to take part in a larger experiment involving the validation of '*objective 
tests” of personality (Kline and Cooper, 1984a) and it was in the course of that 
experiment that their Comprehensive Ability Battery data were obtained. 


Tests 

Scales from the Comprehensive Ability Battery (Hakstian and Cattell, 1976) 
were used as measures of 17 of the best defined primary abilities (Horn, 1976), and 
are detailed in Table 1. These scales are themselves correlated, and six second-order 
ability factors account for most of their variance (Kline and Cooper, 1984b). The 
traditional intelligence test factor ‘‘g’’ has been shown to split into g, (fluid ability) 
and ғ, (crystallised ability; Cattell, 197D1 in such an analysis, the other second-orders 
being retrieval (a factor loaded by the Guilford-type creativity tests), perceptual 
speed, memory, and visualisation. 


Inspection time was measured by an adaptive psychophysical procedure — a 
computerised replication of Nettelbeck’s ‘‘Psychology III class" experiment 
(Nettelbeck, 1982). The apparatus and procedure are described fully below. This 
test, which measures the smallest detectable time difference between the onset of one 
light and a group of others, was preferred to the more usual tachistoscopic 
comparison of line lengths for a number of reasons. Nettelbeck and Kirby (1983) 
report that apparent movement was noticed by subjects in the tachistoscopic line 
comparison task. Thus some subjects may use this as a cue for making a decision. 
Similarly some subjects may concentrate on the whole or part of just one ‘‘leg’’ of 
the mask, and decide whether it is the longer or shorter by detecting flicker when the 
mask is replaced by the stimulus. Thus there may be number of strategies which 
might be adopted to solve this task, the relationship of which to inspection time and 
intelligence are unknown. 


Apparatus 

A row of red light emitting diodes (LEDs), each 5 mm in diameter, were 
mounted horizontally 10 mm apart (centre-to-centre) on a black painted panel. This 
array of LEDs was divided into two halves by a white vertical line. 10-5 cm above 
the intersection of this line and the LEDs was mounted a yellow LED which acted as 
a warning signal. These LEDs were interfaced to a computer (PET) running a milli- 
second timer. This enabled either the LED immediately to the left of the vertical line 
or that to its right to be illuminated before the other seven under software control. 
Two hand-held response buttons were also interfaced to record for each trial the 
subject's decision of whether the light to the left or right of the vertical line was 
illuminated first. 


Procedure 

Comprehensive Ability Battery (CAB) scores were obtained as described by 
Kline and Cooper (1984b), subjects being tested in groups of 10 or fewer. This stage 
of testing took approximately three hours to complete. Inspection times were 
estimated 0-3 months after this date. 


To estimate inspection times subjects were tested individually, seated with their 
eyes 60 cm from the centre of the line of red LEDs (which subtended 7?). They were 
given a handset holding two buttons marked “‘left’’ and ‘‘right’’. A standard tape- 
recorded brief told subjects that 1s after a 1s presentation of the yellow warning 
light, the red light immediately to the right or left of the central white line would be 
illuminated fractionally before the other seven. When the yellow light went out, 
subjects were asked to fixate the intersection of the white line and the row of LEDs. 
Subjects were to decide which of the two lights had come on first and press the 
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appropriate button. It was stressed that this was not a reaction time task and that 
subjects could take as long as they wanted to reach a decision. 


Following Nettelbeck (1982) the PEST (Parameter Estimation by Sequential 
Testing) routine of Taylor and Creelman (1967) determined the time interval 
between the onset of the LED immediately to the right or left of the vertical line — 
determined randomly — and the onset of the remaining seven red LEDs. All eight 
red LEDs then remained illuminated for 2s or until the subject made a response, 
whichever was the longer. Then followed a pause of 2s on all but every 16th trial 
(when this interval was extended to 30s to provide a rest period) after which the 
yellow warning LED was illuminated for 1s. Another presentation followed 1s later. 


The PEST routine was set to determine the time interval at which the subject 
made the correct response 75% of the time, the parameter W (which determined the 
accuracy of this estimate) being set to 1-8. Testing stopped when two successive 
estimates differed by less than 0-005s, the subject's threshold then being taken as the 
mean of these two estimates. The time difference between the onset one of the two 
central LEDs and the remainder was initially set at 0-45 — an easy discrimination 
which ensured that subjects sat at least seven practice trials before being shifted to a 
more difficult discrimination by the PEST routine. 


RESULTS 


Subjects took between 20 and 55 minutes to complete the task as the conver- 
gence criterion was stringent and the large value of parameter W meant that a large 
number of trials was administered at each difficulty level. Slips of attention some- 
times shifted individuals away from the convergence point despite the compulsory 
d periods. Inspection time estimates ranged from 0-028 to 0: 141s, with a mean of 

*093s. 





TABLE 1 
CORRELATIONS BETWEEN INSPECTION TIME AND PRIMARY ABILITIES (N = 20) 
Corrected 
Correlation correlation 
Test with LT. with I.T. 
V Verbal abihty 0-11 0:17 
N Numerical ability —-0-25 —0-36 
S Spatial ability -0-20 — 0:29 
Cs Speed of closure ~—Q-39* 
P Perceptual speed and accuracy -0.53** 
I Inductive reasoning —0-24 —0:34 
Cf Flexibility of closure —0-41* —0-56** 
Ma Rote memory -0-27 
Mk Mechanical knowledge —0-07 —0-10 
Ms Memory span —0.07 -0:10 
Mm Memory for meaningful pairings —0-36 
Sp Spelling ability —0-33 —0-46* 
E Aesthetic judgement —0-52** 
Fs Spontaneous flexibility —0-17 
Fi Ideational fluency —0-41* 
W Word fluency -0:51* 
О Oniginality 0-12 





*P<0-05, ** P < 0-01, I-tailed. 
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Pearson correlations between the inspection time estimate and each of the CAB 
scores are shown in Table 1. As this study was performed on students who had been 
selected on the basis of academic achievement, it is appropriate to correct some of 
the correlations in Table 1 for the effects of restriction of range on fluid crystallised 
intelligence. The primaries that would therefore be affected are inductive reasoning, 
spatial ability, memory span, and flexibility of closure (g,) and verbal ability, 
numerical ability, and mechanical knowledge (for g). 


Unfortunately, however, general population norms for the CAB scales are not 
available, either for the USA or Britain. Thus the figures in Column 3 of Table 1 
were computed by assuming that the ratio of the standard deviations for the 
population and student groups is 3:2 for each of these scales, and correcting the 
correlations accordingly (Guilford and Fruchter, 1978, p. 325). The ratio 3:2 is not 
arbitrary; it is the highest ratio reported between student and population groups in 
the Advanced Progressive Matrices manual (Raven, 1965). The figures in Column 3 
of Table 1 are therefore generous estimates of what the correlations between I.T. 
and the primary abilities might be in the general population. 


DISCUSSION 


The uncorrected correlations in column 2 of Table I show that I.T. correlates 
significantly with perceptual speed, aesthetic judgement, speed of closure, flexibility 
of closure, word fluency, and fluency of ideas. Of these variables, only flexibility of 
closure falls out on either g, or g,. Thus it appears that there is no simple link 
between I.T. and these secondaries. This is borne out in column 3 of Table 1: even 
when generous estimates are made of what these correlations might be in an 
unselected sample, only spelling ability reaches significance in addition to flexibility 
of closure. There is no evidence that primaries such as inductive reasoning and 
spatial ability which regularly load the g, secondary (and which have been shown to 
do so using these tests with a superset of the present sample; Kline and Cooper, 
1984b) correlate appreciably with I.T. Similarly numerical ability and mechanical 
knowledge, which load g, at the second order, are essentially unrelated to I.T. 


A number of subjects reported that they were using apparent movement cues to 
arrive at a decision, thus removing one of the supposed advantages of this 
procedure. It is interesting to speculate whether this I.T. task relates significantly to 
critical flicker fusion frequency, a variable which Jensen (1980, p. 707) feels ‘‘should 
have important theoretic connections with g’’ but which has received little rigorous 
study in this context. 


This analysis reveals that it is imprudent to measure g, or g, by inspection time 
tasks. І.Т. correlates with those primary abilities concerned with visualisation and 
speed rather than those associated with g, or g, but the pattern of correlations 
between І.Т. and abilities show that it is not possible to measure any of the pervasive 
second-order ability factors (Horn and Cattell, 1966) by such means. For example, if 
I.T. were measuring perceptual speed, one would expect variables such as spelling 
and memory span to correlate with I.T. (Hakstian and Cattell, 1978; Kline and 
Cooper, 1984b) which they do not. Similarly, if I. T. were influenced by the retrieval 
secondary (ibid.), a correlation with the originality scale should be found. It is 
suggested that rather than considering which secondaries can be measured by I.T. it 
may be more fruitful to analyse I.T. in terms of a number of other elementary 
perceptual processes. 


Requests for reprints should be addressed to Dr. Colin Cooper, Department of 
Psychology, University of Exeter, Washington Singer Laboratories, Perry Road, Exeter, 
EX4 4QG. 
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STUDENTS’ APPROACHES TO LEARNING IN AN 
INNOVATIVE MEDICAL SCHOOL: A CROSS-SECTIONAL 
STUDY 


By RUFUS M. CLARKE 
(Faculty of Medicine, University of Newcastle, NSW, Australia) 


Summary. A slightly modified version of Entwistle and Ramsden’s Approaches to 
Studying Inventory was administered to first-, third- and final-year students in an inno- 
vative medical school (response rate 84 per cent), to see whether the Inventory’s 
constructs also apply to medical students, to see whether they might constitute a helpful 
framework for counselling students in academic difficulty, and to see whether the 
students’ responses reflected the educational philosophy of the school. 


The psychometric robustness of the instrument was confirmed, and its applicability 
to these groups of medical students was established. The instrument was not highly 
predictive of academic success, but seemed to offer a useful framework for student 
counselling. Student responses to the inventory were consistent with the school’s 
educational philosophy. 


INTRODUCTION 


For many years, research into education has sought to elucidate those factors 
associated with academic success and failure. Students’ study habits and the 
motivation to study are both important, but these attributes often explain a 
disappointingly low proportion of the variance in student performance. This may be 
because they fail to take into account not only more subtle individual differences 
between students but also the range of different educational and social contexts in 
which students learn. 


The last decade of thinking about student learning has seen a synergism 
between two new elements: first, a shift in perspective to consider learning as 
experienced by the student, not as seen by the teacher or researcher, and secondly, 
the clearer articulation of a range of student motivations to learn and student 
strategies for learning. The synthesis of these two elements has arisen from a series 
of studies, principally by three groups, in Gothenburg (Ference Marton and his 
collaborators), the United Kingdom (Noel Entwistle and his collaborators), and in 
Newcastle, Australia (John Biggs and his collaborators). Among them, using 
different approaches and different analytical instruments, the three groups have 
developed a reasonably coherent account of the different approaches of students to 
their learning tasks, well portrayed in Marton et al. (1984) and Biggs (1985b). 


The aims of the present study were three-fold. First, most of the work published 
hitherto in this area has not included medical faculties, and it was important to 
discover whether these new insights could be applied to undergraduate medical 
education. Secondly, it might be possible to use these insights to help individual 
students in academic difficulty to achieve a better understanding of themselves and 
the potential solutions to their problems; indeed the original interest in this topic 
arose from the similarity of Entwistle and colleagues’ (1979) descriptions of the 
**pathologies of learning” to the performance of certain Newcastle students about 
whose academic progress the Faculty was anxious. Thirdly, it was important to see 
whether the students in this innovative medical school displayed approaches to 
learning that were consistent with the school's educational goals. A thumbnail 
sketch of the school may help to clarify the nature of these goals. 
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The Faculty of Medicine at the University of Newcastle accepted its first intake 
of 64 students in 1978, and four cohorts of students have now completed the five- 
year course. Students undertake problem-based learning in small groups, addressing 
clinical and community problems which form the focus of their learning in both 
clinical and basic medical sciences (Engel and Clarke, 1979). The educational 
programme emphasises the processes of clinical reasoning and self-directed learning, 
and the application of knowledge rather than its retention and regurgitation. 
Methods of assessment of student progress and achievement are designed to foster 
these aims. Admission to the course (Vinson et al., 1979) is either on the basis of 
high academic achievement, or on the demonstration of academic competence 
together with possession of the personal qualities judged to be necessary for effec- 
tive medical practice. Almost half the entrants are not direct school-leavers, and 
approximately half are women. There are no pre-requisite academic subjects. 
Evaluation of the graduates and their careers is proceeding, but, anecdotally, they 
are well accepted, even in the Sydney teaching hospitals populated largely by 
graduates of the two large older-established metropolitan medical schools. 


Barlier studies had shown that Newcastle medical students had a more 
favourable view of their medical school as a learning environment (Clarke ef al., 
1981; Williams, 1982) than those in other Australian medical schools; the question 
at issue was whether these differences in attitude were matched by differences in 
their approaches to learning. A comparison between the approaches to learning in 
108 Australian medical schools has been described elsewhere (Clarke and Newble, 
1985). 


The present paper examines the psychometric qualities of Entwistle's 
Approaches to Studying Inventory (Ramsden and Entwistle, 1981; Entwistle and 
Ramsden, 1983), and reports a cross-sectional study of Newcastle students in 1984, 
including the differences in approaches to learning between years of students, 
between the sexes, and between students of different ages and with varying previous 
experience of tertiary education. This last factor was included because nearly half of 
the students who enter this Faculty do so with previous experience of tertiary 
education, which could affect comparisons with other institutions whose students 
largely enter straight from school. 


METHOD 


Student inventory 

The Approaches to Learning Inventory (available from the author) was based 
on the 64-item Approaches to Studying Jnventory of Entwistle and Ramsden (1983), 
with minor changes in phraseology which were discussed with Entwistle (personal 
communication). These changes made the items more relevant to medical students' 
activities, or more descriptive of local conditions. Interpolated in this inventory were 
a further 16 items (making 80 in all) which addressed the issue of working in small 
groups, a salient feature of the Newcastle programme; these items were included for 
exploratory purposes, and are not analysed further in this paper. 


For the purposes of analysis, Entwistle and Ramsden (1983) grouped their ques- 
tionnaire items into 16 subscales (usually four items comprising a subscale), and the 
subscales were grouped into three or four Orientations (see below). 


As in Entwistle and Ramsden's study, respondents could choose between five 
responses to each item: strongly agree (scoring 4), agree with reservations (3), 
disagree with reservations (1), strongly disagree (0), or the item does not apply to 
me/I cannot make a definite decision (2). The choices were printed in this order 
across the page, to encourage respondents to make a definite decision in one 
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direction or the other. The inventory was pilot-tested with small groups of students 
to check its comprehensibility and acceptability. 


All students enrolled in Years 1, 3 and 5 (the final year) of the Newcastle 
Medical School in 1984 were invited to complete the self-coded inventory during, or 
immediately before, scheduled class activities in mid-year, at least four weeks before 
any major examination. Those who were absent on the day were followed up by 
letter. Of the 182 students currently enrolled, 153 responded (84 per cent). To test 
the concordance of student and tutor perceptions of the student's approach to 
learning (i.e., in order to examine the concurrent validity of the inventory), students 
in Years 1 and 5 were asked for permission to consult tutors who had had substantial 
contact with the students during the year (the Year 3 programme did not give this 
substantial contact). The students were informed of the general thrust of the study 
and its independence from the Faculty's assessment procedures; a guarantee of 
confidentiality was given. 


To examine test-retest reliability, half the students in Year 1 were approached 
one month later, and asked to complete the inventory a second time. The other half 
of the Year 1 students were approached a further two months later, at the time of 
their end-of-year assessment, with a similar request. On each occasion 26 pairs of 
inventories were available for analysis (a total of 52 retest responses from the 
original 63 respondents). 


On the rare occasions when data were missing, students were traced in person 
and asked to complete the missing items. There were no missing items in the final 
data set. 


Tutor inventory 

It was hypothesised that a measure of concurrent validity might be obtained if 
some of a student's approaches to learning were visible to an outside observer who 
had had sufficient contact with a student to be likely to be able to form a reliable 
judgment. Accordingly, the items of the student inventory were scanned to see 
whether they might be so observable; 12 possible items were found, and incor- 
porated into an inventory for tutors, together with two items designed to elicit the 
tutor's overall view of the student, to examine for halo effects. 


For each consenting Year 1 student (N — 56), two tutors were approached, each 
of whom had been with that student's group for approximately 48 hours of regular 
contact time during the current academic year. For each consenting Year 5 student 
(N = 39), two tutors were approached, each of whom had been supervising the 
student's clinical attachment during the preceding four weeks. 


Data analysis 

The data were entered on a VAX computer in the University of Newcastle's 
Computing Centre. Analysis was undertaken using standard statistical packages 
(Dixon, 1981) together with an item analysis programme developed by Dr. M. N. 
Maddock, Faculty of Education. 


RESULTS 
1. Item analysis and reliability of subscales 
Item analysis showed that there was a good range of responses to each item, 
and only five of the 80 items attracted a response of 2 (the item does not apply to 
me/I cannot make a definite decision) from more than 10 per cent of respondents. 


Three measures of subscale reliability were used: (i) the correlation coefficient 
for each item score with its subscale score exceeded 0-33 (P < 0-001); (ii) for every 
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item, this correlation coefficient exceeded the correlation coefficient of the item 
score with the total score; (iii) for 16 subscales the median value of Cronbach’s 
coefficient alpha was 0-48 (range 0-29 to 0-74). 


2. Test-retest reliability 

At each retest interval there was a good correlation between Orientation and 
subscale scores on test and retest (median values for the linear correlation coeffi- 
cients for each of the Orientation scores were 0-75 at one month and 0-73 at three 
months); medians for the 16 subscale scores were 0-68 (range 0:34 to 0-89) at one 
month, and 0-70 (range 0:24 to 0-84) at three months. Of the correlation coeffi- 
cients calculated, two were less than 0:38 (the value at which P = 0-05). No 
significant differences between test and retest mean scores on Orientations or 
subscales were found, except for the Negative Attitudes to Studying subscale score 
which rose from 3-0 + 2-9 to 5-0 + 3:9 (mean + SD) (P « 0-05) in the students 
retested at the time of their end-of-year assessments. 


3. Concurrent validity 

Completed inventories were received from each of two tutors in respect of each 
of 56 Year 1 students, and, for Year 5 students, from two tutors in respect of 10 
students, and from one tutor in respect of a further 18 students (response rate 36 per 
cent). In each year, the inter-rater (tutor) reliability was poor, and tutors' responses 
correlated poorly with those of students. This method does not provide evidence of 
concurrent validity. 





TABLE I 
FACTOR LOADINGS OF APPROACHES TO LEARNING SUBSCALES 
Factors 
I II ПІ IV 
Eigenvalues 3-66 2:14 2:03 1-27 
% of variance explained 23 13 13 8 
Subscales 
Deep Approach DA 77 
Intrinsic Motivation IM 76 
Relating Ideas RI 74 
Evidence and Logic EL 74 
Comprehension Learning CL 59 36 —39 
Disorganised Approach to Study DApp 72 
Globe-trotting GT 68 26 
Negative Attitudes to Studying NA 63 
Fear of Failure FF 56 —25 
Strategic Approach SApp —55 32 28 
Operation Learning OL 78 
Improvidence Imp 30 70 
Surface Approach SA 35 62 
Syllabus Bound SB —43 56 28 
Achievement Motivation AM 75 


Extrinsic Motivation EM 66 


Decimal points and values less than 0-25 omitted. 


Rurus М. CLARKE 313 


4. Construct validity 
Factor Analyses 


Principal components factor analysis with rotation to oblique simple structure 
was undertaken on the subscale scores. Five eigenvalues greater than unity were 
obtained, and three-, four- and five-factor solutions examined. The four-factor 


TABLE 2 


SYSTEMS FOR GROUPING SUBSCALES INTO ORIENTATIONS 
(SEE TABLE 1 FOR TITLES OF SUBSCALES) 


Ramsden & Entwistle (1981) Ramsden (1984) Present Study 
Meaning Meaning Academic 
Orientation Orientation Orientation 
(0-79)* (0-79) (0-78) 
DA DA DA 
RI RI RI 
EL EL EL 
IM IM IM 
CL 
Reproducing Reproducing Non-Academic 
Orientation Orientation Orientation 
(0-56) (0-66) (0-70) 
SA SA SA 
SB SB SB 
FF FF Imp 
EM Imp OL 
Achieving Non-Academic Negative 
Orientation Onentation Orientation 
(0-47) (0-59) (0-65) 
SApp DApp FF 
—DApp NA DApp 
—NA GT NA 
AM GT 
—-SApp 
Styles & Strategic Achieving 
Pathologies Orientation Orientation 
(0-17) (0-35) (0-29) 
CL AM AM 
GT EM EM 
OL SApp 
Imp 
Styles of 
Learning 
(-0-55) 
CL 
OL 





*Cronbach’s alpha for the Orientation. 
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solution explained 57 per cent of the variance, corresponded reasonably well with 

those described by Ramsden and Entwistle (1981) and Watkins (1982, 1983), and 

grouped the subscales in a way that made intuitive sense (Table 1). In particular, 

Factor I seems to be highly reproducible in all these studies. On the basis of this 

factor analysis, four Orientations were derived: Academic (Factor 1), Non- 

oe (Factor ID, Negative (Factor III) and Achieving (Factor IV) — see 
able 2. 


Construct validity of Orientations 

Ramsden and Entwistle (1981) grouped the subscales into three Orientations 
that they named Meaning, Reproducing, Achieving, and one dimension called 
Learning Styles and Pathologies. 


Subsequently, Ramsden (1983) rearranged the subscales to produce four 
different Orientations, which in turn differ slightly from those which emerged from 
the factor analysis in the present study (Table 2). Data from the present study were 
therefore analysed in parallel according to each of these three systems of 
Orientations, to identify which (if any) offered the best construct validity. 


Values of Cronbach’s alpha, calculated for each of the Orientations produced 
by combining the subscales according to the three different systems, are set out in 
Table 2. Values of alpha for the Orientations according to Ramsden (1983) and the 
present study are generally higher than those for the Orientations in the original 
Ramsden and Entwistle (1981) formulation, indicating greater internal consistency 
of the performance of the subscales grouped according to the later systems, as well 
as their robustness across widely different samples of students. 


The intercorrelations between the Orientation scores are shown in Table 3. 
Again, the two more recently-developed Orientation systems show a negative 
correlation between the meaning/academic and _ reproducing/non-academic 


TABLE 3 
CORRELATION COEFFICIENTS BETWEEN ORIENTA’ ION SCORES FOR EACH SYSTEM OF ORIENTATIONS 


Ramsden & Entwistle (1981) Orientations 


Meaning Reproducing Achieving 
Reproducing —28 
Achieving —20 —29 
Styles and Pathologies 14 39 -22 
Ramsden (1984) Orientations 

Meaning Reproducing Non- 

academic 

Reproducing -27 
Non-Academic -17 
Strategic 10 11 —07 
Present Study Orientations 

Academic Non- Negative 

Academic 

Non-Academic -27 
Negative —]19 42 
Achieving 10 11 —23 


Decimal points omitted from Table. 
For P = 0-05, г = 0-16; for P = 0-01, r = 0-21 (df = 151). 
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Orientations (or strategies for studying), and a strong positive correlation between a 
negative/non-academic Orientation (motivation to study) and a reproducing 
Orientation (study strategy). The Orientation system developed in the present study 
also shows a strong negative correlation between the Negative and Achieving 
Orientations (strategies). Both the later systems comprise coherent and internally 
consistent ways of expressing the different Orientations. 

In the subsequent analyses, the Ramsden (1983) system of Orientations will be 
used, to allow direct comparison with other published work, and to reduce the 
confusion that would be generated by the introduction of yet another set of 
Orientations. 


5. Predictive validity 
The predictive validity of the subscales and Orientations of the inventory was 
tested by linear correlation of questionnaire scores with the performance of students 


TABLE 4 
CORRELATION COEFFICIENTS OF ASSESSMENT SCORES WITH SUBSCALE AND ORIENTATION SCORES 


Year 1 Year 3 Year 5 
oss (N = 59) (N = 43) (N = 42) 
Deep Approach 38 
Relating Items 
Evidence and Logic 
Intrinsic Motivation 


Surface Approach -40 
Syllabus Bound 
Fear of Failure 
Extrinsic Motivation — 34 ~43 
Strategic Approach 40 52 
Negative Attitudes to Studying ~33 —48 
Disorganised Approach to Study — 34 — 52 -43 
Achievement Motivation 28 
Comprehension Learning -42 
Improvidence —35 
Operation Learning 
Globe trotting ~4l 
Orientation Meaning 
System 1 Reproducing ~49 
Achieving — 38 
Styles & Pathologies —57 
Orientation Meaning 
System 2 Reproducing —41 
Non-Academic -37 -57 
Achieving 36 
Orientation Academic 
System 3 Non-Academic — 35 
Negative -41 -57 —34 
Achieving 





Decimal points omitted. Only correlation coefficients for which P « 0-05 are shown; those for which 
P « 0-01 are underlined. 

Orientation Systems 1, 2, 3: Orientations defined by Ramsden and Entwistle (1981), Ramsden (1984) 
and the present study, respectively. 
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at the subsequent end-of-year assessments. For Years 1 and 3, the assessments had a 
clinical component (mainly in the area of patient interviewing and physical 
examination) but the bulk of the assessment was written, the main instruments being 
the Modified Essay Question based on Hodgkin and Knox (1975), and the Objective 
Structured Clinical Assessment, based on the work of Harden and Gleeson (1979). 
Student performance on each item of these written instruments was scored 0 or 1 by 
the examiners, and the total score was taken to be the outcome variable for each 
student. 


In Year 5, a wider range of assessment instruments was employed, including a 
Medical Independent Learning Exercise (a 48-hour library search and report task — 
Feletti et al., 1984) and critical evaluation of published papers, as well as Clinical 
Supervisors’ Reports from each clinical attachment. The item scores from these 
instruments contributed to one of five separate Domains of assessment (Feletti et 
al., 1983) each of which had a graded score of 0-3. For Year 5 students, the overall 
Domain score (maximum 15) represents the outcome variable for the purposes of 
this paper. 


Correlations were calculated between subscale and Orientation scores, and the 


TABLE 5 


APPROACHES TO LEARNING QUESTIONNAIRE — ORIENTATION AND SUBSCALE SCORES BY YEAR OF 
ENROLMENT (MEAN + STANDARD DEVIATION) 


Year of Enrolment 


1 3 5 Total 

Respondents/enrolled students 63/66 47/57 43/59 153/182 
Meaning Orientation 44:2 + 8:5 47:1 + 82 45-6+8-6 45:4 + 8:5 
Deep Approach 11:7 2:7 12-64 25 11:9 + 2:6 12:0 + 2°6 
Relating Ideas 11:5 € 2-8 12:3 + 2:5 12-0 € 2:4 11:9 + 2-6 
Evidence and Logic 10:5 + 2:7 11-04 2:8 11-2 +2-5 10:8 + 2-7 
Intrinsic Motivation 10-4 + 2-7 1-24 2:9 10-5 + 3-6 10:6 + 3-0 
Reproducing Orientation 30:7 + 9:1 29.7 + 8:8 29-7 + 9:3 30:1 + 9:0 
Surface Approach 12:0 + 4-0 11-64 40 12:3 + 4:3 12-0 x 4:1 
Syllabus-Bound * 7:042°8 61+ 2:8 5:5 + 2:7 6:3 + 2:8 
Fear of Failure 5:26 2:8 6:1 + 2:8 6:2 є 3:0 5-8 € 2:9 
Improvidence 6:4 € 3:2 5:9 + 2:9 577 + 2-6 6:0x2:9 
Non-Academic Orientation 19-5 + 7:3 21-2 + 10:2 21:7 + 8:5 20-7 + 8:6 
Disorganised Approach to Study 8-44 4-1 Blt 4-5 7:2 + 4:2 8-0 + 4:3 
Negative Attitudes to Studying жек 36227 61+ 4-3 6:6 + 4-1 5:2+ 3:9 
Globe-trotting 7543-1 70+ 3:6 7:9 + 3:5 7:5 + 3:4 
Strategic Orientation 21:0 x 6:4 20:9 + 5-6 19:6 + 5:6 20:6 + 6:0 
Achievement Motivation * 8-2+#3-5 7:6 + 3-7 6:4 + 3:8 7:5 + 3-7 
Extrinsic Motivation 2:4 £2:4 25 + 2:6 30+ 3-1 2:6 x2:7 
Strategic Approach 10-4 2-9 10:8 + 2:5 10-2 + 2:6 10-5 + 2:7 
Learning styles 

Comprehension Learning 9:0 + 3:6 9-34 3:7 9-243°7 9-1+ 3:6 
Operation Learning 9:8 + 3-1 9:3 + 2:7 9:2 + 3:0 9-5 + 3-0 





* One-way analysis of variance; difference between years, Р < 0:05. 
™P<0-01. 
*** P «0-001. 
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assessment score appropriate to each year of students. The results for all years are 
summarised in Table 4, from which it is clear that a rather global measure of 
successful performance at end-of-year assessments is better predicted by those 
subscales that address student motivation and affect than by those which address 
cognitive styles (with the possible exception of Year 3). None of the three 
Orientation systems is clearly superior in predicting student performance at 
assessment. 


6. Cross-sectional analysis by year 

One-way analysis of variance was undertaken on the subscale and Orientation 
scores by year (Table 5). Negative Attitudes to Studying showed a statistically 
significant (P < 0-001) rising trend with seniority. Syllabus-Boundness and Achieve- 
ment Motivation showed significant (P < 0:01) downward trends. No other statis- 
tically significant differences were detected. 


7. Effect of sex 


One-way analysis of variance was undertaken on the subscale and Orientation 
scores by sex (Table 6). Females scored significantly lower (P < 0-001) on Extrinsic 
Motivation, and the Strategic Orientation. Two-way analysis of variance showed 
these effects to be independent of year. Females scored higher on Fear of Failure 
(P « 0-05), with higher scores on this subscale particularly in Years 3 and 5. 


TABLE 6 


APPROACHES TO LEARNING QUESTIONNAIRE — ORIENTATION AND SUBSCALE SCORES BY SEX (MEAN + 
STANDARD DEVIATION) 


Sex 
Female (N — 68) Male (N — 85) 


Meaning Orientation 44:8 + 9:0 45:4 8:5 
Deep Approach 11:8 + 2-8 12:1 12:5 
Relating Ideas 11-7 2:0 12:0 + 2:5 
Evidence and Logic 10:8 + 2-7 10-8 + 2-8 
Intrinsic Motivation 10:6 + 2-9 10:5 x 3:3 
Reproducing Orientation 31:2 8:4 28:7 €9-7 
Surface Approach 12:3 + 4-0 11:5 + 4:2 
Syllabus-Bound 6:2 + 2:9 6:4 + 2:9 
Fear of Failure 6:3 + 2:9 * 5:2x2:9 
Improvidence 6:4 2:8 5:6 x3:0 
Non-Academic Orientation 21-2 + 8-0 20:0 + 9°0 
Disorganised Approach to Study 8-14-12 7:9 x 4:4 
Negative Attitudes to Studying 5:4 x 3:9 5'0 + 3:8 
Globe-trotting 7T:8 + 3-3 7:1x 3:4 
Strategic Orientation 18:6 + 5-7 ee 22:0 + 5:9 
Achievement Motivation 6:7x35 8:1 + 3:7 
Extrinsic Motivation 1-8 + 2-0 «e 3:2x2:9 
Strategic Approach 10-1 + 2-9 10-7 + 2-6 
Learning Styles 

Comprehension Learning 9:3 + 3-9 8:9 + 3:6 
Operation Learning 9:4 + 3:3 9:5 + 2:8 





* One-way analysis of variance; difference between sexes, Р < 0-05. 
*** P «0-001. 
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8. Effects of age and duration of tertiary study 

One-way analysis of variance was undertaken on the subscale and Orientation 
scores by age. The most obvious trend was a decline in Achievement Motivation 
score (P < 0-05) in older students. Analysis by duration of tertiary education 
revealed a similar decline in Achievement Motivation score, and also in Syllabus- 
Boundness (P < 0:01), with increasing duration of tertiary education. Evidence and 
Logic subscale scores rose (P « 0-05) with increasing duration of tertiary education. 
Meaning Orientation score also showed an upward but not statistically significant 
trend, and Reproducing and Non-Academic Orientation scores trended downwards 
with increasing duration of tertiary study. 


DISCUSSION 


The results of this study support the applicability to an innovative under- 
graduate medical education programme of a general conceptual framework which 
has been developed in other tertiary faculties, as shown by the item analyses and 
correlations within and between subscales and Orientations. The factor analyses also 
exhibit a close resemblance to those arising from other studies both in Australia and 
Britain. Some of the differences may arise from differences of curricular content 
and context; for example, the undergraduate medical course is longer than many 
others, and engenders a high degree of professional socialisation, compared with, 
for example, an Arts degree course. Nevertheless, the consistent appearance of a 
factor reflecting a deep, meaning-seeking approach to study (Ramsden and 
Entwistle, 1981; Watkins, 1982, 1983; Biggs and Kirby, 1983) is a strong suggestion 
of the universality of this approach to learning. 


The study also confirms that, in this school as elsewhere, the most consistent 
predictor of academic performance resides in the affective domain of the inventory 
(e.g., Negative Attitudes to Study, Disorganised Approach to Study) rather than the 
cognitive, although Watkins (1982, 1983) found that the Meaning Orientation and 
some of its subscales also predicted actual examination performance. Cognitive 
attributes were also predictors in Ramsden and Entwistle's (1981) study, in which 
self-perceptions of academic performance, rather than actual examination results, 
were used as the outcome variable for prediction. 


The comparison of the three different Orientation systems suggests that those 
arising from the factor analyses undertaken by Ramsden (Ramsden and Entwistle, 
1983) and in the present study seem to be preferable to the original (1981) 
Orientations. To allow comparisons with other studies, Orientation scores of 
Ramsden (1983) have been emphasised, although it is likely that future work will 
employ a revised inventory. 


In this study, the cognitive aspects of the approach to learning generally fail to 
emerge as predictors of academic success. A self-reported leaning towards the 
Meaning Orientation, which subsumes the academic virtues that universities 
espouse, does not appear to confer any advantage in performance in this Faculty's 
system of assessment of student progress and achievement, but neither does a self- 
reported leaning to the Reproducing Orientation. Indeed, the Reproducing 
Orientation was a negative predictor for Year 5 examination performance (Table 4), 
as might be expected from the nature of the assessment instruments. 


Nevertheless, the reality of the construct underlying the Meaning Orientation is 
underlined by its persistent appearance in all the factor analyses conducted in this 
and other published studies. It is possible that the inventory fails adequately to tap 
these higher cognitive attributes, or that its global nature fails to differentiate 
between different approaches taken by students in response to different segments of 
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the programme. An alternative explanation is that the end-of-year written 
assessment instruments used in this Faculty do not adequately explore students’ 
cognitive processes. Although these instruments are intended to test clinical 
reasoning and the application of facts as well as factual recall, inspection of the 
component items shows the difficulty of constructing assessments which consistently 
explore the higher cognitive levels (Feletti and Smith, 1986). 


Studies of changes in approaches to learning during a course (either cross- 
sectional or longitudinal) are few; Watkins and Hattie (1983) found both Meaning 
and Reproducing Orientation scores to decline with seniority in a longitudinal study 
of university students, as did Biggs (1985a) in a cross-sectional study of male high 
school students. Coles (1984) found Meaning scores to fall, and Reproducing scores 
to rise, during first year study in a medical school with a traditional curriculum. 
Newble and Gordon (1985) found an increase in Meaning Orientation scores in a 
cross-sectional study of medical students at the University of Adelaide; the relatively 
small changes in both Newble’s and the present study attest either a relative lack of 
change in student approach with seniority, or the insensitivity of the instrument. 
This point cannot be resolved conclusively at this stage, but the range and 
consistency of scores in published reports gives at least some confidence in the 
psychometric sensitivity of the instrument. The rising trend of Negative Attitudes to 
Study with increasing seniority of Newcastle students is supported by the earlier 
cross-sectional finding of a trend of diminishing satisfaction with the learning 
environment in more senior students (Clarke, et a/., 1984), and suggests that at least 
that section of the instrument is sensitive to changes in students’ approaches. 


The failure of this study to demonstrate the concurrent validity of the inventory 
is perhaps not surprising. The inventory taps students’ perceptions, which may not 
be reflected in the way they behave; the circumstances of tutorial contact may not 
have been appropriate for exhibiting that behaviour; and the tutors may not have 
been perceptive in observing or accurate in recording student behaviour. This issue 
awaits resolution. 


The second aim of the study was to see whether these insights into the process of 
learning could be used to help students in difficulty. Whilst a definitive answer to 
this question must await a properly designed intervention study, one interesting 
pointer emerges. The learning pathologies Globe-trotting and Improvidence are 
both significantly associated (Table 4) with poor academic performance in Year 3, 
which is precisely the stage in the Newcastle educational programme where the 
performance characterised by these pathologies had been noted by Faculty members 
as they marked both written and oral examinations. Negative (but non-significant) 
correlation co-efficients were also found for both pathologies in Year 1 and for 
Improvidence in Year 5. Thus the findings of this study reinforce the clinical 
impressions of Faculty members, and support the strategy of counselling students 
who demonstrate these pathologies by encouraging them to develop a better balance 
between comprehension and operation learning styles. Recent (1985) changes in the 
educational programme have further strengthened the emphasis on clinical 
reasoning and the application of knowledge, and these changes may lead to a 
reduction in the incidence of these pathologies in Newcastle students in the future. 


A small-scale study, using interviews with individual students, is in progress to 
explore this direction. Anecdotally, discussion with students of the different 
approaches subsumed in the Orientations has certainly been a helpful way of 
encouraging student self-exploration, but no hard data are available, except to 
report that no consistent pattern of response to the inventory has emerged from 
inspection of the inventory profiles of all those students (N = 8) who have 
experienced academic difficulty in the 12 months since the inventory was 
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administered. Of course, it must be remembered that the prediction of individual 
performance was not amongst the aims of the originators of the inventory. 


Differences in age and sex and tertiary experience do not appear to be 
associated with gross differences in approaches to learning. As Watkins (1982) 
found, older students had higher Meaning and lower Reproducing Orientation 
scores, as did males in this study, in contrast to the findings of Watkins and Hattie 
(1981). Biggs (1985a) found a nadir at age 22 in his equivalent of Ramsden’s 
Meaning Orientation; this finding was not confirmed in the present study. Thus sex 
and previous educational background seem unlikely to constitute major risk factors 
for poor approaches to learning, although they should be taken into account in 
counselling individual students. 


The third aim of the study was to examine whether the school’s different 
approach to medical education was reflected in its students’ approaches to learning. 
Vu and Galofre (1983) demonstrated differences between the learning behaviours of 
students in two contrasting American medical schools, whilst Ramsden (1984) 
demonstrated that students in different tertiary institutions and faculties had 
different and usually explicable scores on the Orientations of the Approaches to 
Studying Inventory. The mean subscale and Orientation scores of Newcastle 
students are indeed higher on Meaning and lower on Reproducing Orientations than 
the mean scores given by Ramsden and Entwistle (1981), Ramsden (1984) and 
Watkins (1982), but these authors studied non-medical students. These comparisons 
are encouraging, but a more convincing demonstration is afforded by the 
comparative study reported elsewhere (Clarke and Newble, 1985), and by Coles’ 
(1984) report that Meaning and Reproducing Orientation scores were maintained 
during first year in a problem-based medical school, whereas, for first year students 
in medical schools with more traditional curricula, Meaning Orientation scores fell 
during the year, whilst Reproducing Orientation scores rose. 


In summary, therefore, this study adds support to the concept of an array of 
distinct (but not necessarily orthogonal) approaches to learning; it suggests that the 
concept can be generalised to include medical students, and that students’ 
approaches to learning may reflect the educational philosophy of the institution as 
Entwistle et al. (1979) and Ramsden and Entwistle (1981) suggested. Finally, the 
concept of these approaches to learning offers a potentially productive framework 
for helping individual students who find themselves in academic difficulty. 
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ABILITY AND ENVIRONMENT CORRELATES OF 
ATTITUDES AND ASPIRATIONS: PERSONALITY GROUP 
DIFFERENCES 


By KEVIN MARJORIBANKS 
(University of Adelaide, Australia) 


Summary. The study examined the proposition that for children of different personalities 
there will be variations in relationships between ability, school environment, and 
measures of school attitudes and aspirations. Data were collected from 500 12-year-old 
Australian children. In the analysis the children were classified into four personality 
groups that were labelled as extravert-anxious, extravert-adjusted, introvert-anxious, and 
introvert-adjusted. Within each group regression surfaces were constructed from models 
that included terms to account for possible linear, interaction, and curvilinear 
associations among the variables. The results indicated that: (a) there were strong 
personality-group variations in children’s attitudes to school and their perceptions of 
school environments, moderate group differences in educational aspirations, and small 
group variations in ability; (b) relations between ability, school environment, and 
measures of attitudes and aspirations varied for children of different personality types; 
(c) ability and environment had differential linear and curvilinear associations with 
attitudes and aspirations, for children from different personality groups. 


INTRODUCTION 


In analyses of the personality of societal groups, Magaro еї al. (1985) and Magaro 
and Ashbrook (1985, p. 1479) have proposed that ‘‘personality serves as the 
organising force within the individual that guides interactions with the environment 
so that discrete elements of the person and situation are arranged in a meaningful 
whole that is manifested in behavior". They suggest further that ''personality 
characteristics can be thought of as psychological traits whose reliable co-occurrence 
within particular clusters of individuals result in their being considered as 
personality types" (Magaro and Ashbrook, 1985, p. 1479). 


In social-psychological models of status attainment it is generally proposed that 
children's attitudes and aspirations mediate, in part, the impact of ability and 
environment influences on eventual educational attainment (e.g., Looker and 
Pineo, 1983; Carpenter and Western, 1984; Davis, 1985; Shavit and Williams, 
1985). Often, the research has examined the relationships for children when they are 
classified into social categories such as social classes and ethnic groups (e.g., 
Gottfredson, 1981; Hoelter, 1982; Morgan, 1983). If, however, personality is 
considered as an organising force that guides interactions between environments and 
individual characteristics then status-attainment research has been restricted by its 
failure to include personality-type categories. 


This present investigation was generated from the proposition that for children 
of different personalities there will be variations in relationships between ability, 
environments, and school-related outcomes. In particular, the study examined 
relationships between children's perceptions of school environments and their 
attitudes to school and educational aspirations, at various ability levels, for children 
of different personality types. 
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METHOD 

Subjects 

The sample included 500 12-year-old Australian children, 260 boys and 240 
girls. Findings from the investigation need to be interpreted within the limitation 
that the sample was not random. In the analyses, however, adjustments were made 
for design effects. Significance levels were adjusted using the formula: standard 
error of sample estimate = (design effect)“ х simple random sample standard 
error (Ross, 1985). 


Measures 

The Children's Personality Questionnaire, Form A was used to assess children’s 
personality (Cattell et al., 1970). Data on 14 primary personality dimensions are 
generated from the schedule. When scores on the dimensions were factor analysed 
and the factors rotated, two second-stratum personality factors were identified. The 
factors were labelled: (a) Extraversion-Introversion which loaded strongly on the 
dimensions of aggressive, enthusiastic, casual, tough, sophistication, and 
uncontrolled, and (b) Anxious-Adjusted which loaded on the dimensions of 
unstable, shy, and insecure (also see Allen and Schuerger, 1983; Matthews, 1985). 


The Otis Intermediate Test, Form AB was used to assess intellectual ability 
while a scale was devised to measure children’s perceptions of their school 
environment. A conceptual framework suggested by Bernstein (1977) formed the 
basis for constructing the environment scale. The framework proposes that 
socialisation proceeds within four interrelated situations that are identified as the 
instructional, interpersonal, imaginative, and regulative contexts. For each context, 
Likert-type items that gauged children’s perceptions of teacher behaviour were used 
to construct factor scales (Armor, 1975). Items used to measure perceptions of the 
academic orientation of a school’s instructional context, for example, were of the 
form, ‘‘It often seems that teachers in this school are not very interested in whether 
we learn or not”, and ‘‘Teachers in this school really push students to the limits of 
their abilities". Perceived friendliness and pastoral quality of the interpersonal 
context were assessed by items such as, ‘‘This is a very caring school — teachers care 
greatly about students", and ‘‘Outside of classes most of the teachers are very 
friendly and find a lot of time to talk to students’’. In the imaginative school context 
scale there were items of the form, ‘‘Most of my teachers encourage us to use a lot of 
imagination in our schoolwork’’, and ‘‘Teachers are always trying out new and 
often exciting ways of doing things in this school’’. The regulative context scale 
which measures perceptions of the warmth or severity of authority relationships in a 
school was assessed by items such as, ‘‘Generally, those in charge of this school are 
not very patient with students", and ‘‘There are too many rules and regulations in 
this school — you need permission to do almost anything". 


When the scores on all the items were factor analysed, using principal 
component analysis, a general factor was generated that had a theta reliability 
estimate of 0-84. In the investigation it was decided, therefore, to adopt one 
measure of perceived school learning environment. It was defined positively by 
strong academic orientations, friendly interpersonal interactions, very imaginative 
and innovative practices, and perceived warmth of authority relationships. 


For the development of an attitude to school measure, a tripartite attitude 
model was adopted in which affect, behaviour, and cognition are considered as 
distinct components of attitude (Breckler, 1984; Bagozzi and Burnkrant, 1985). 
Three attitude scales, consisting of five-point Likert-type items, were constructed 
that had theta reliability estimates greater than 0-76. The affective component of 
school attitudes was assessed by 12 items such as ‘‘My school work worries me’’, 
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and ‘І get upset when I am asked a question about my work’’. In the behavioural 
dimension there were 20 items such as, ‘‘It’s good to fool about in class", and “Т 
work and try very hard in school’’. The cognitive dimension included 12 items of the 
form, ‘‘Success in school hardly matters as it has little to do with success in life", 
and ‘‘Much of the homework we are given is pointless". 


Ideal and realistic educational aspirations were assessed by asking the children 
what educational level they would like to attain and what level they really expected 
to achieve. From the responses, five-point aspiration scales were formed. 


RESULTS 


For the analysis, children were classified into four personality types which were 
defined by the median split of scores on the extraversion-introversion and anxious- 
adjusted dimensions. The types were labelled as extravert-anxious (151 children), 
extravert-adjusted (96), introvert-anxious (155), and introvert-adjusted (98 
children). Because of the relatively small sample size in each category, for 
undertaking multivariate investigations, the analyses were not conducted separately 
for girls and boys, which is a further restriction of the study. 


In Figure 1 the profiles show the personality-group differences in mean scores 
for the ability, environment, attitude, and aspiration measures. Scores for the figure 
(and for later figures) were standardised with means of 50 and standard deviations 
of 10, calculated over the total sample. The profiles indicate that, relativelv, 
introvert-adjusted children were characterised by above-average ability, positive 
perceptions of school, positive school attitudes, and moderate educational 
aspirations. In contrast, extravert-anxious children had lower ability, more negative 
perceptions of school, less positive school attitudes, and lower educational 
aspirations. For each variable the differences in mean scores between these two 
groups were statistically significant beyond the 0-01 level of probability. 


The profiles also show that although introvert-anxious children had relatively 
favourable perceptions of school and positive behavioural and cognitive attitudes, 
they expressed negatively-oriented affective attitudes. Extravert-adjusted children, 
in contrast, had relatively unfavourable perceptions of school, more negative 
behavioural and cognitive attitudes, but positive affective-oriented attitudes. That 
is, while introvert-anxious children worked hard in class and generally appreciated 
the worthwhileness of school; they were also anxious and worried by their school. 
Extravert-adjusted children, however, tended to ‘‘enjoy fooling about"' in class, had 
a negative orientation about the worthwhileness of school, but were not upset by 
school. 


When personality-type was included in multiple regression models as a set of 
mutually exclusive categories, it had strong associations with the affective (0-46), 
behavioural (0:45), and cognitive (0:33) attitude scores, and with children's 
perceptions of school (0-32). The relations to ideal (0-20) and realistic (0-24) 
educational aspirations were more moderate while the association with ability 
(0-14), although significant, was particularly modest. This latter finding suggests 
support for the proposition that personality and intelligence are essentially 
independent psychological constructs (Eysenck, 1971; Saklofske, 1985). These 
initial results indicated, therefore, a set of moderate to strong personality-type 
differences in children's perceptions of their school environments, educational 
aspirations, and attitudes to school. 

The first investigation of relationships between ability, environment, and 
children's attitudes and aspirations involved an analysis of zero-order correlations. 
In Table 1 the results show that, in the two introvert groups, children's ability had 
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FIGURE 1 


PERSONALITY-GROUP PROFILES OF MEAN SCORES ON ABILITY, SCHOOL ENVIRONMENT AND 
OUTCOME MEASURES 
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strong associations with the attitude dimensions and was related moderately to 
aspirations. Ability had negligible to moderate associations with attitudes in the 
extravert groups, was related strongly to the aspirations of extravert-anxious 
children, and had negligible associations with aspirations in the extravert-adjusted 
group. The resuits also show that the ability-attitude correlations in the introvert 
groups were often significantly greater than those in the extravert groups. In 
contrast, the ability-aspiration associations were significantly higher for extravert- 
anxious children than for children in the extravert-adjusted group and for the 
realistic aspirations of introvert-adjusted children. 


School environment had moderate to strong associations with behavioural and 
cognitive attitudes in each personality group while it had negligible to modest 
associations with the affective attitude dimension. In general, children's perceptions 
of school environments were not related to their educational aspirations. There were 
no significant personality group differences in the school environment correlations. 


Zero-order correlations provide, however, only a limited understanding of the 
covariate structure of variables. Therefore, the relations between ability, 
environment, and outcome measures were investigated further by plotting regression 
surfaces generated from two-stage hierarchical regression models. In the models, 
product terms were included to test for possible interaction effects and squared 
terms were added to examine possible curvilinear relationships. Eysenck (1983) has 
stressed the importance of analysing parametric variables in personality research 
while prior studies have indicated the possible presence of interaction and 
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curvilinear relations between measures of ability, children’s environments, and 
school-related outcomes (Fraser and Fisher, 1983; Song and Hattie, 1984; Walberg 
and Tsai, 1984; Willms, 1985). That is, the regression models were of the form: Z = 
aX + bY + cX.Y + dX? + eY? + constant, where Z, X, and Y represented 
attitudes or aspirations, ability, and school environment, respectively. The 
interaction and squared terms were formed by subtracting the mean scores from the 
raw scores on the relevant variables (Pedhazur, 1982). In the second stage of the 
analyses, interaction and squared terms that were not related to the outcome 
measures were deleted from the regression models, and associations between the 
variables were re-examined. 


Generally, the findings in Table 2 indicate that, in each personality group, the 
ability and environment measures combined to have moderate to strong associations 
with attitudes and more modest relationships with educational aspirations. There 
were, however, group differences in the amounts of variance associated with the 
predictor variables. Also, the regression weights revealed the possible complexity of 
relations among the variables in the various personality groups. In the extravert- 
adjusted group, for example, the squared-ability term made a significant 
contribution to the variance in each of the attitude and aspiration measures. Also, 
for introvert-anxious children, the squared-environment term was related to the 
behavioural attitude dimension, the ability-environment interaction was associated 
with cognitive attitudes, and the curvilinear ability term was related to ideal 
educational aspirations. 


From the raw regression weights in Table 2, regression surfaces were plotted to 
explore further the nature of group variations in the relationships among the 
measures. Because of space limitations it was not possible to present all the surfaces. 
Instead, eight surfaces that reflect the nature of the different associations among the 
variables have been chosen and presented in two figures. In Figure 2 the surfaces 
show the regression-fitted relations between ability, school environment, and 
behavioural attitudes, for children in each personality-type group. For the extravert- 
anxious and introvert-adjusted children, the surfaces are representative of those 


TABLE 1 


ZERO-ORDER CORRELATIONS BETWEEN ABILITY, PERCEPTIONS OF SCHOOL ENVIRONMENT AND MEASURES 
OF ATTITUDES AND ASPIRATIONS 











Personality Attitude Dimensions Aspirations 
group Affective Behavioural Cognitive Ideal Realistic 
Extravert- 0:23**, b 0-16*, 0-34**,8 0-43** 0-435, b 
anxious 0:22** 0-42** 0-49** 0-16* 0-08 
Extravert- 0-38** 0:06 5 0-37** 0-128 0:11* 
adjusted 0:12 0:37** 0-30** 0-13 0:03 
Introvert- 0:43**,* 0-36** b 0:54**,8 0-32** 0-26** 
anxious 0-17* 0-45** 0-32** 0-07 0-10 
Introvert- 0-55** b 0-49**, © 0-49** 0.32** 0:19*,5 
adjusted 0:19 0-37** 0-30** 0-17 0-09 

*P < 0:05. **P < 0:01. 

Note: 


Correlations associated with ability are uppermost in each pair of coefficients while those related to 
perceptions of schcol environment are lowermost. 

а. 5 € Differences in corresponding correlations between the personality groups are significant 
beyond 0-05. 
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TABLE 2 


Raw REGRESSION WEIGHTS FOR THE MULTIPLE REGRESSION OF ATTITUDES AND ASPIRATIONS ON ABILITY 
AND SCHOOL ENVIRONMENT FOR CHILDREN FROM EACH PERSONALITY GROUP 





Ability and Personality group 
environment Extravert- Extravert- Introvert- Introvert- 


variables anxious adjusted anxious adjusted 


Affective attitudes 














Ability 0-140* 0-230* 0-186* 0-280* 
Environment 0-120* —0:017 0-167* 0-053 
(Ability)? a 0-003* a a 
Multiple R 0:331 0-483 0-458 0-559 
Adjusted R? 9:75 20-87 19-90 29:76 
Behavioural attitudes 
Ability 0-198* 0:251* 0-240* 0-352* 
Environment 0:404* 0-256* 0-199* 0-211* 
(Ability)? a 0-004* a a 
(Environment) a a 0-003* a 
Multiple R 0-465 0:446 0-583 0-581 
Adjusted R? 20-54 17-28 32:71 32-42 
Cognitive attitudes 

Ability 0-284* 0-219* 0:204* 0-213* 
Environment 0-279* 0-113* 0:120* 0-094* 
Ability x 

environment a a 0-005* a 
(Ability? 0-006* 0-002* a a 
Multiple R 0-640 0-573 0-648 0-546 
Adjusted R? 39-76 30-60 40-78 28:37 

Ideal educational aspirations 

Ability 0-052" 0-026* 0-038* 0-052* 
Environment 0-016* 0-010 0-008 0:021* 
Ability x 

environment a a a —0-002* 
(Ability)? 0-001* 0-001* 0:001* a 
Multiple R 0:500 0-294 0:375 0:446 
Adjusted R? 23:45 5:66 12:33 17:32 

Realistic educational aspirations 

Ability 0-039* 0-024* 0-022* 0-035* 
Environment 0-009 0-002 0-008 0-016* 
Ability x 

environment a | а а —0-002* 
(Ability)? a 0-001* a a 
Multiple R 0:441 0:346 0-280 0:340 
Adjusted R? 18:34 9:11 6:61 8-75 

* Value of regression weight exceeds at least twice its standard error. 

Note: 


а Variable not included in the second stage of the hierarchical regression analysis. 
All multiple correlations significant beyond 0-05 level. 
All R? were adjusted for size of samples. 
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regression models in which only linear relationships were significant. At each ability 
level, increases in perceived school environment scores from 30 to 70 were associated 
with increments of approximately 20 and 10 regression-fitted points in the 
behaviour-oriented attitudes of extravert-anxious and introvert-adjusted children, 
respectively. For extravert-adjusted children, ability tended to act as a threshold 
variable so that until a certain ability level was reached there was no association with 
the attitude scores. 

After that threshold value, however, further increments in ability were related 
to increasing changes in attitudes. At a perceived school environment value of 30, 
for example, the fitted-behavioural attitude scores were 38, 38, and 43 at ability 
levels of 40, 50, and 60, respectively. In the introvert-anxious group, behavioural 
attitudes had a linear association with ability and a curvilinear relationship with 
school environment. At each ability level, after the perceived school environment of 
introvert-anxious children attained a certain threshold value, further increments in 
the favourableness of the environment were related to sizable changes in behavioural 
attitudes. That is, the surfaces in Figure 2 indicated that when the children were 
classified by personality type, their ability and perceptions of school environment 
had differential associations with behaviour-oriented school attitudes. 


FIGURE 2 


FrrTED-BEHAVIOURAL ATTITUDE SCORES IN RELATION TO ABILITY AND SCHOOL 
ENVIRONMENT FOR PERSONALITY GROUPS 
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In Figure 3 the surfaces show the regression-fitted relations between ability, 
school environment, and ideal educational aspirations, for children in each 
personality-type group. At each ability level, children’s perceptions of school 
environment were not related to the educational aspirations of introvert-anxious and 
extravert-adjusted children. In these two groups, at each environment level, ability 
had a positively increasing curvilinear association with aspirations. For extravert- 
anxious children, their perceptions of school had a linear relation to their ideal 
aspirations. At each ability value, for example, increases in environment scores from 
30 to 70 were associated with increments of approximately 10 regression-fitted 
aspiration points. The shape of the surface for introvert-adjusted children reflects 
the presence of a significant interaction effect. At low ability levels, changes in 
environment scores were not related to variations in educational aspirations. For 
high ability values, however, increases in perceived environment were associated 
with sizable increments in aspirations. At an ability level of 65, for example, 
increases in environment scores from 30 to 70 were related to increments of approxi- 
mately 18 regression-fitted aspiration points. The surfaces in Figure 3 indicate again 


FIGURE 3 


FITTED-ÍIDEAL ASPIRATION SCORES IN RELATION TO ABILITY AND SCHOOL 
ENVIRONMENT FOR PERSONALITY GROUPS 
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that ability and perceived school environment had variable associations with the 
outcome measures, for children of different personality types. 


DISCUSSION 


The findings of this present study suggest that if children are considered in 
relation to the personality dimensions of extraversion-introversion and anxious- 
adjusted, then: (a) there are strong personality-group variations in children’s 
attitudes to school and their perceptions of school environments, moderate group 
differences in educational aspirations, and small group variations in intellectual 
ability, (b) relations between ability, school environment, and measures of attitudes 
and aspirations vary for children of different personality types, and (c) ability and 
environment have differential linear and curvilinear associations with attitudes to 
school and educational aspirations, for children from different personality groups. 


It has been proposed by Blass (1984, p. 1023) that there is a ‘‘new flexibility 
among social psychologists about incorporating personality variables into their 
theories and research strategies. And among personality psychologists they portend 
an increasingly explicit recognition of situational influences and the constraints they 
impose on dispositional determination’’. The interaction between personality 
dimensions and situational characteristics has become a focus of much recent 
research. Many of the investigations, unfortunately, have been restricted to an 
examination of how much variation in an outcome measure is related to personality, 
situation, and the interaction between personality and situation (Furnham, 1981; 
Magnusson, 1981; Snyder, 1983; Ayton and Wright, 1985; Furnham and Jaspars, 
1985). This present analysis suggests, however, the proposition that relations 
between children’s school-related outcomes and measures of their cognitive ability 
and learning environments differ as a function of their personality style. That is, 
personality may be considered to act as a critical substratum variable (see Mitchell, 
1984). What is required now are further investigations that examine relationships 
between other person variables, a variety of learning contexts, and outcomes for 
children from different personality groups. Only when such research is completed 
will it be possible to conclude to what extent personality is an organising force that 
guides interactions between children’s individual characteristics, their learning 
environments, and eventual educational attainments. 


Requests for reprints should be addressed to Professor Kevin Marjoribanks, Professor of 
Education, University of Adelaide, Adelaide, S. Australia 5001. 
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THE VALIDITY OF THE BOEHM TEST OF 
BASIC CONCEPTS 


By EILEEN F. SMITH 


Summary. Content, predictive and construct validity of the BTBC are assessed. Most of 
the 50 BTBC concepts are found to be seminal to the infant school curriculum. Predictive 
validity evidence shows relationships between BTBC scores and later achievements in 
language, mathematics and reading as assessed by teachers. The BTBC appears to be a 
multi-dimensional test of general cognitive development as well as being a guide to 
concept acquisition. Diversity is shown in the efficacy of test items: non-linguistic 
information, children’s conceptual preferences and limitations in item design, as well as 
semantic, syntactic and conceptual knowledge, influence responses. 


INTRODUCTION 


Tur Boehm Test of Basic Concepts has recently been modified and standardised on 
children in England (Smith, 1984, 1986). Attempts were made to assess the content, 
predictive and construct validity of the BTBC. 


No evidence of validity was presented with the BTBC (Boehm, 1971). Subse- 
quently, Boehm ef а/. (1980) assessed content validity by examining printed 
materials and teachers’ verbal instructions in American schools. Boehm (1971, 1980) 
claimed that the 50 BTBC concepts are assumed by teachers to be understood by 
children, are seldom taught and are necessary for following instructional directions 
and for school success. Even if these claims are correct for American education does 
the same hold true for the UK? 


The theoretical foundation on which the BTBC was based is not made explicit 
(Boehm, 1971). Statements about the purpose of the test appear rooted in 
pragmatism: ‘‘The BTBC is designed to measure children's mastery of concepts 
considered necessary for achievement in the first years at school” (p. 3). The test’s 
rationale is described as though concept understanding and concept labelling are 
synonomous, or, at any rate, synchronous. Underpinning the rationale appears to 
be the assumption that there is a convergence of the child's linguistic and cognitive 
abilities which is tapped by the test and the resulting measurement is assumed to 
assess concept mastery. The test's stated purpose is to identify children whose 
“overall level of concept mastery is low”. 


Is a level of concept mastery a viable construct? Piagetian theory asserts that 
the answer lies in the level of thinking developed by the child. Within the broad 
stages of preoperational and operational thinking, however, any one child does not 
necessarily achieve exactly the same level of development across all concepts at any 
given time (Inhelder et al. 1974; Piaget, 1977, 1978; Vuyk, 1981а, 1981b). Concep- 
tual Learning and Development theory (CLD) suggests that any of four successive 
levels of concept attainment can be ascertained for any particular concept but the 
same levels of different concepts are mastered at different ages (Klausmeier, 1979). 
This is related to the concept domain and the abstractness of the examples 
presented. Individual concepts coexist at different levels of sophistication (Brainerd, 
1979). Vygotsky’s (1962) hierarchical model of concept organisation suggests that a 
stage level might be ascertained. In none of these models is concept mastery 
regarded as a unidimensional construct. 
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Can a summary score, obtained from a multidimensional test with item content 
from various domains, estimate with any accuracy a child's level of concept 
mastery? Recent developments in the analysis of item response patterns suggest that 
global scores can be misleading indicators of a child’s attributes (Hunt and 
Lansman, 1975; Hutt, 1980; Harnisch, 1983). Though summary scores may serve as 
useful predictors of success, the interrelatedness of subsets of items make the 
diagnosis of individual strengths and weaknesses difficult. However, global scores 
may indicate the standard of performance reached by groups of pupils, relative to 
other groups, where the criteria are defined in terms of particular knowledge or 
skills. This is contingent on the criterion measurements being reliable and unbiased 
(Jensen, 1980). Where concept mastery is concerned it would appear that each 
concept or, at the least, each group of concepts being tested should be examined 
separately. Each concept or test item may need its own criterion reference. To do 
this for each of the 50 BTBC items was beyond the scope of this research. However, 
attempts were made to investigate the validity and reliability of each item. 


1. CONTENT VALIDITY 


METHOD 


The spoken and written language of first school education was sampled with a 
view to assessing the use of BTBC concepts. 


(a) Teachers’ discourse was observed across a range of curriculum involving 11 
teachers in two schools in contrasting localities for a total of 11 hours 8 minutes with 
classes ranging in age from three to eight years. 


(b) Seven commonly used graded reading series, the workbooks and/or 
teacher’s handbook from eight mathematics schemes, two books from the Schools 
Council project Science 5/13 and the manual for Assessment in Nursery Education 
(Bate and Smith, 1978) were scrutinised for BTBC terms and their parts, derivatives, 
synonyms and antonymns. 


RESULTS 


(a) In the observed sample of teachers’ discourse 13 of the original 50 BTBC 
expressions were never used and 12 of the BTBC terms as presently adapted were not 
used but only one (‘‘several’’) was not represented in some form. 


(b) Eight of the 50 concept terms did not appear in any of the reading books 
examined. A further 16 terms were seldom represented. The frequency with which 
the remaining BTBC terms occurred, either in whole or in part, suggests that 62 per 
cent of the BTBC terms (as presently adapted) are relevant to the content of graded 
reading series used in first schools. However, 16 per cent of the BTBC terms are 
unlikely to be encountered by children even if their reading books are drawn from 
several series. 


Only two of the BTBC terms, as presently adapted, did not appear, either in 
whole or in part, in the vocabulary recommended by the mathematical handbooks 
for teachers but both these terms (‘‘several’’ and ‘‘separated’’) occurred, though 
rarely, in the workbooks examined. Only three terms (‘‘alike’’, ‘‘left’’, ‘‘equal’’) 
were absent from the mathematical vocabulary suggested to nursery teachers. 


The early stages of Science 5/13 contained all but four of the BTBC terms as 
used in the present adaptation either in whole or in part: these are ‘‘whole’’, 
“‘separated’’, '*miss out’’, ‘‘third’’. 
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The oral directions and questions used by Bate and Smith (1978) contained 20 
of the BTBC terms exactly as used in the present adaptation and six further terms in 
part; 24 of the BTBC terms did not feature though some of the concepts were used 
with different vocabulary (e.g., ‘‘line’’ instead of ‘‘row’’). 


DISCUSSION 


Jensen’s (1980) definition of content validity emphasises that items should be 
representative of the specified universe of knowledge. Boehm (1971) defines this as 
(a) a group of concepts considered necessary for achievement in the first years at 
school and (b) which are ‘‘seldom if ever explicitly defined, or (are) defined in their 
simple forms but subsequently used in complex forms without adequate transitions” 
(p. 3). Which concepts are used in complex form without adequate transitions is not 
clear nor can any evidence be found of this occurring in classroom practice. 


Judged on the basis of their appearance in the samples of teachers’ discourse 
and in school texts, 20 of the BTBC items were valid in content. A further six items 
were accepted as valid by the first criterion but it is not taken for granted by teachers 
that they are understood by young children. Indeed much of the nursery and infant 
curriculum is planned around introducing and consolidating such concepts 
(‘‘most’’, **second"', ‘half’, ‘іп order", ‘‘third from", ‘‘least’’). This is seldom 
achieved by verbal definition alone: carefully structured play materials and activities 
together with the skills of an adult experienced in talking with young children, at the 
child’s level of meaning, are required. Furthermore, it is questionable that any of the 
first group of concepts mentioned above, if misunderstood, would escape the 
attention of a skilled nursery or infant teacher. 


A further seven items were deemed valid by Boehm’s first criterion if the test 
were used with 6- and 7-year-olds: few, farthest, widest, between, as many, middle- 
sized, miss out. The concepts embodied by these terms may be learnt earlier (in part 
at least) but not necessarily met in these forms in the reception class or nursery. 


If the concept labels were changed, a further six items would become valid by 
both criteria: around to round, almost to nearly, beginning to starting, alike to 
same, below to under, matches to goes with or belongs. 


There was no evidence that understanding of the remaining 11 concept terms 
was necessary for achievement in the first years at school in England: away from, 
some not many, several, centre, not first or last, right, left, forward, separated, pair, 
equal. In particular, ‘‘several’’ is judged to be entirely without face validity. 


In addition to antonyms of the BTBC concepts, such as bottom, outside, which 
were used frequently by the teachers and are present in curriculum materials, the 
following concepts also appeared frequently: apart/together, curved/straight, edge, 
higher than, long/short, tall/short, thick/thin. These seem more relevant to the 
stated purpose of the test than do some of the concepts already included by Boehm. 


2. PREDICTIVE VALIDITY 


METHOD 


Teachers were asked to rate children who had been tested with Form A nine to 
12 months previously. On a three-point scale, staff assessed present achievements in 
spoken language, counting and computational skills (Maths A), mathematical 
concepts (Maths B) and reading. Separate forms were used for each curriculum area 
with criteria for rating delineated and the list of children’s names on each form; 29 
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schools were circulated, a total of 528 children. Twenty-three sets of responses were 
returned with ratings of 429 children. 


RESULTS 
A product moment correlation matrix (with missing data) of the four sets of 
ratings, sex and the earlier BTBC scores (unsmoothed) is shown in Table 1. 





TABLE 1 
CORRELATION MATRIX OF BTBC SCALED SCORES, SEX AND TEACHERS’ ASSESSMENTS 
Sex Speech Maths A Maths B Reading 
BTBC 0-029 0-449* 0-481* 0-457* 0:424* 
Sex — 0-065 0:024 —0-018 0-279* 
Speech — — 0-611* 0-615* 0-606* 
Maths A — — — 0-868* 0-699* 
Maths B — — — — 0-699* 

*P<0-01 

DISCUSSION 


The results suggest that the BTBC scores, scaled for age, can predict to some 
extent children’s achievements, some nine to 12 months later, in mathematics, 
reading and spoken language as assessed by teachers. The correlations are 
compatible with correlations between the BTBC and subsequent achievement tests: 
SRA math 0:56, SRA reading 0:43 (Piersel and McAndrews, 1982), median 
correlation of Stanford Achievement subtests 0:47 (Estes et al., 1976), SAT spelling 
0-54, SAT language 0-53, SAT arithmetic (concepts) 0:65 (Steinbauer and Heller, 
1978), Gates reading 0-56 (Busch, 1980). 


The advantages and defects of teachers’ assessments of the level of attainment 
of their pupils are well-documented (e.g., Schools Council, 1964). In early 
childhood it is likely that teachers' assessments are as reliable as standardised 
classroom tests. The reliability of the ratings made in this study is not known but it is 
fairly safe to assume that the assessments were made conscientiously. There are no 
known factors related to the grading procedure that could have caused bias, 
although bias in favour or against individual children could not be controlled. The 
coarse grading (above average, average, below average) should have contributed to 
reliability: most variability in assessment is within the middle range of ability. Even 
though it is impossible to obviate the halo effect entirely the steps taken probably 
reduced it. Not all children were given the same grade for each area of the 
curriculum. 


3. CONSTRUCT VALIDITY 
The claim that the BTBC is essentially a one-factor test of ability to follow 
directions (Piersel and Reynolds, 1981; Piersel and McAndrews, 1982) seems 
unlikely. Nevertheless, the Rasch latent trait model was tried as a method of analysis 
of the item response data for the English standardisation of the BTBC (Smith, 1984, 
1986). Discrepancies between expectations and observations confirmed the multi- 
dimensional nature of the BTBC which was factorially analysed. 
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When testing children’s understanding of concept terms we need to know if 
their correct responses reflect understanding or whether their conceptual preferences 
make it appear that they know these terms before they actually do (Clark, 1980). We 
need to be aware of the logical and grammatical constraints of the sentences used to 
formulate task instructions (Clark and Clark, 1977; French and Brown, 1977; 
Donaldson, 1978; Weil and Stenning, 1978; Kavanaugh, 1979; Trosborg, 1982). 
With young children the memory load of complex sentences must be considered 
(Bryant, 1974; Kavanaugh, 1979; Trosborg, 1982). Where a child’s interpretation of 
the context of an utterance conflicts with his imperfect semantic knowledge he is 
more confident of the non-linguistic context (Hoogenraad ef al., 1978). 


It was predicted that the nature of the children’s incorrect responses to each 
BTBC item would reveal some of the non-linguistic strategies they employed when 
their linguistic knowledge was imperfect. By analysing the nature of the responses to 
each item on both Forms at each age level an attempt was made to ascertain 
whether, guesswork apart, a child might arrive at the right answer without fully 
understanding the concept or answer incorrectly when he might well show under- 
standing in a clearer context. 


METHOD 
A randomly selected sample of 1,928 children from 78 schools in three LEAs 
was tested with the BTBC, Form A. The test directions were modified for British 
use. Raw scores were converted to standard scores by age level in half years from 3:6 
to 7:6. A subsample of 144 children from 14 schools was tested with both parallel 
forms, A and B. All testing was individual. 


(a) Principal components analyses were carried out on the correlation matrix of 
the BTBC item scores (Form A) with Varimax and Promax rotations. 


(b) Facility and discrimination values were computed for each test item at each 
six-monthly age level from 3:6 to 7:6. 


(c) Each item was analysed in terms of correct responses and the various kinds 
of errors made. The main considerations were firstly, whether a child who passed an 
item did so because he understood the concept and its label or whether he could pass 
(guesswork apart) without any, or only partial, understanding. Secondly, 
predictions were made about whether a child who understands the concept might fail 
the item. 


A child may choose any of several pictures in response to each BTBC item. For 
example, item 15 Form A depicts three cakes: the first has a big wedge missing, the 
second is the whole cake, the third has a small wedge missing. The child is asked to 
find ‘‘the cake that is whole, the whole cake". Young children who have acquired or 
are acquiring the left-right orientation in reading and picture recognition might opt 
for the first picture if uncertain of ‘‘whole’’. On Form B the ‘‘whole apple" is 
placed first on the page. Differences between the frequencies with which the first 
and third pictures were selected (Form A) were tested with chi-square one-sample 
test. The error frequencies for the second and third pictures on Form B were 
similarly examined. 


For each item error data was collected for each age level before totalling so the 
data was inspected for unexpected response patterns which might have operated at 
different age levels. Other kinds of error frequencies which were tested were those 
which suggested that children may have been influenced by conceptual preferences 
such as the larger amount, the topmost object or the nearest object (Clark and 
Clark, 1977). 
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RESULTS 
(a) Principal components analyses yielded no clear major factors. A large 
number of first order factors were produced at each age level. Neither Varimax nor 
Promax rotations produced consistent factor loadings on the test items across the 
nine age levels. 


(b) The distribution of item responses (Form A) by age level is shown in Table 2 
(N = 1,928). Table 3 shows the consistency of scoring on both forms (A and B) for 
each item (М = 144). 

(с) The qualitative analysis of each of the 50 BTBC items is illustrated here by 
five examples starting with item 15 which is described above. 


Item 15: “whole” 
This item discriminates most usefully from 4:6 to 5:6. It is a difficult item for 
the youngest children and easy for most 6- and 7-year olds. 


Confusion of whole with hole may cause problems in interpretation. The most 
common wrong response to Form A was the first cake in the row. This may mean 
either that the non-linguistic strategy of choosing the first object predominated or 
that the cake with the largest hole was selected. 


The correct answer to item 15 is also the greater amount and, additionally, on 
Form B it is the first object in the row. So there is the possibility that some children 
may respond correctly without understanding whole. Conversely, failure does not 
necessarily mean that the concept of **wholeness"' is absent. 


Item 15 appears to be slightly easier on Form A than on Form B both in this 
sample and in Boehm's. This may be because the Form A drawing clearly depicts the 
whole cake whereas the Form B drawing may be seen by a child as a side-view of the 
apple. 


The word whole was seldom observed in teachers’ discourse; it appears in few 
mathematics books and occasionally in only one of the reading schemes and does 
not feature at all in the two Science 5/13 books. However, it is included in the 
manual for Assessment in Nursery Education (Bate and Smith, 1978). 


Item 23: "after" 
**, , . the girl after her hair was cut.” (Form А). 


**, . . the piece of wood after it was cut.” (Form B). 


To achieve success on this item theoretically requires linguistic understanding of 
a complex sentence, reversible thinking and the ability to master the memory load of 
a relatively lengthy sentence. Reversible thinking is decisive for correct before and 
after constructions (Trosborg, 1982). Sentences which preserve the actual order of 
events are easier than those which reverse the order; the most difficult of all are 
those of the construction ‘‘B after A’’ as in item 23 (Weil and Stenning, 1978). 
Therefore a child who can distinguish after from before could fail this item because 
of the syntactical complexity of the direction. 


Even though by far the commonest error was to point to the drawing which 
depicts the before situation it cannot be assumed that these children, in other 
circumstances, would interpret after as before. Indeed there seems to be no stage 
where after is understood as meaning before (Coker, 1978). 


It is probable that before and after are first acquired as prepositions and then as 
subordinating conjunctions; used as subordinating conjunctions they present 
considerable difficulty (French and Brown, 1977; Coker, 1978). The child may 
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TABLE 2 


DISTRIBUTION OF ITEM RESPONSES BY AGE LEVEL 
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TABLE 3 
CONSISTENCY OF SCORING ON BOTH FORMS (A AND B) FOR EACH ITEM 








Proportion with 
identical scores 
on each form 










Form A 
Facility 


Form В 


Item Facility 












1 0-98 0-95 
2 0-93 0-90 
3 0-98 0-83 
4 0.96 0-93 
5 0:94 0-92 
6 0-66 0-70 
7 0.89 0-93 
8 0-78 0-82 
9 0-65 0-78 
10 0:67 0-68 
11 0-93 0-94 
12 0°71 0:75 
13 0°76 0.91 
14 0-65 0-77 
15 0-65 0-81 
16 0.97 0:96 
17 0-64 0-87 
18 0°85 0-78 
19 0:76 0-78 
20 0-88 0:82 
21 0-83 0-90 
22 0:71 0-76 
23 0-62 0:74 
24 0:21 0-31 
25 0-39 0-65 
26 0-43 0-74 
27 0-83 0-85 
28 0-98 0-56 
29 0-79 0-67 
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adopt either of two strategies: a syntactic strategy of attending to the main clause 
and not the subordinate (French and Brown, 1977; Coker, 1978) and a semantic 
strategy in which the order of events is interpreted to respond to the actual order of 
occurrence (Clark, 1971; Coker, 1978). This semantic strategy is the more frequent 
when the child is cued to pay attention to both clauses; otherwise the syntactic 
strategy is more prevalent (Coker, 1978). As the children had already done 22 BTBC 
items some at least would be cued to attend to both clauses and therefore would be 
likely to apply the order-of-mention strategy. The use of either strategy does not 
imply partial knowledge or lack of knowledge of before and after. Contrary to 
Clark’s (1971) assertion, when tasks are designed to control strategy effectiveness, 
there is no evidence that after is more difficult than before (French and Brown, 
1977; Coker, 1978; Goodz, 1982; Trosborg, 1982). 


There were fewer errors on Form B than on Form A. On Form B the picture 
showing the affer situation is positioned last in the row. Children’s comics place the 
last event in time at the end of a row of pictures. However, Boehm’s (1971) results, 
which show more errors on Form B than on Form A, do not support this 
explanation. Even so, it is possible that some children chose the last and incorrect 
picture on Form A for this reason. It is also conceivable that a few children may 
have been confused by the usage of after in the phrase ‘Чоок after" which was 
observed in teachers’ discourse and in all of the reading schemes examined. Thus 
they might have chosen the picture showing the hairdresser ‘‘looking after’’ the 
girl’s hair. Others may have employed a first-in-line strategy by which they would 
have answered correctly on Form A. 


Item 24: almost 
**, , . the bottle that is almost empty." (Form A). 


**, . . the basket that is almost full." (Form B). 


In addition to almost item 24 also taps the concepts empty (Form A) and full 
(Form B). From 4 years onwards this is a very easy item on Form A with over 90 per 
cent of each age group achieving success and virtually no discrimination above 3 
years. On Form B item 24 is difficult with a success rate of only 21 per cent. Only 31 
per cent achieve the same score on both forms. Boehm (1971) records higher scores 
for Form B than for Form A. 


A child can pass Form A without comprehending the word almost. As no bottle 
is completely empty the word almost may be disregarded and the correct answer 
arrived at from the cue empty or most empty. The majority of wrong responses are 
the first bottle in the row which is half empty. On Form B only two children selected 
the apparently empty basket whereas 107 chose the full basket. This may have been 
because, as with Form A, almost is not understood or is ignored, and the basket 
which is most full selected. Or, the compelling notion of fullness (Donaldson and 
McGarrigle, 1974) may have overridden other criteria and influenced judgments. It 
is likely that the children who did pass item 24, Form B, understood the concept 
almost though some of these children may have used the first-in-line strategy. 
Failure on either Form indicates incomprehension of the word almost. 


Almost appears infrequently in infant school curricula and seems to be absent 
from nursery curricula. It was only observed in teachers’ discourse with 7-year-olds. 
Nearly and not quite are more common terms. 


Not only is the word almost apparently unnecesary for success in the first 
school and the design of Form A ill-conceived but the Form A drawings are of poor 
quality. The symbolic portrayal of water is not always understood by young 
children: 18 out of 30 children (aged 3:6-7:6) did not perceive any water in the bottle 
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which is almost empty. (This was a small sample, distinct from the standardisation 
sample, questioned about their interpretation of the BTBC pictures disembedded 
from the test.) 


Item 25: half 
**, , . the pie that is kalf gone." (Form А). 


**, . . the box that is half black.” (Form B). 


Facility values on Form A range from 0-28 at 3% to 0-88 at 7 years with the 
highest discrimination value (0-40) at 6 years. The Form B version appears more 
difficult with an overall facility value of 0-39. Boehm (1971) also records higher 
scores for Form A. It seems therefore that the comprehension of half depends on 
context: young children are more familiar with a pie half gone than a box half black. 
Thus failure denotes a lack of understanding of Aalf in certain contexts. Excepting 
those children who apply a first-in-line strategy a child would need to be able to 
judge half and know the label half in order to pass. 


Errors on Form A tend towards a preference for the larger amount which is five 
sixths of the pie; incorrect responses on Form B each depict only a quarter of the 
box. 


Half features throughout the nursery and infant school curriculum and is 
therefore a relevant concept. 


Item 30: other 
*. , . the other one... the other pudding.” (Form A). 


«|, . the other toy.” (Form B). 


Other appears frequently in nursery and infant school curricula. Item 30 (Form 
A) becomes steadily less difficult with age, discriminating best at ages 4:6 to 5:6. 
Forms A and B are of approximately equal difficulty. It is unlikely that (guesswork 
apart) a child could pass item 30 without understanding the concept other or fail if 
he did understand. 


In addition to item 30 a further 14 items on Form A and 14 items on Form B 
were judged to be reliable indicators of children's understanding or lack of under- 
standing. It was judged that children who pass 13 of the Form A items and 12 of the 
Form B items (including item 23) can be assumed to have conceptual understanding 
but some may fail because of the item design whilst others will fail because they have 
not yet acquired the concept or its label. Children who pass 15 of the Form A items 
(including items 24 and 25) and 16 of the Form B items (including item 24) may or 
may not do so because they have reached conceptual understanding but it can be 
assumed that those who fail lack the necessary knowledge. Seven Form A items 
(including item 15) and seven Form B items (including items 15 and 25) neither 
reveal that a child understands the concept if he passes nor necessarily lacks 
conceptual understanding if he fails. 


DISCUSSION 


It is a well-nigh impossible task to construct a foolproof picture test where each 
item measures understanding of a different concept. Complete acquisition of a 
dimensional concept cannot be presumed until the child has contrasted it with at 
least one other concept from the same dimension (Clark, 1980) and been retested in 
different contexts. Recent work on semantic development emphasises the context 
dependency of young children (e.g. Harris et а/., 1978; Hoogenraad ef al., 1978). 
Non-linguistic information interacts with lexical and syntactic knowledge 
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(Donaldson and McGarrigle, 1974). Syntactic strategies appear to be independent of 
the meanings of terms (Coker, 1978). In the comprehension of adult speech, 
younger children rely more upon non-linguistic information than do older children 
and the range of this information is wider than was once supposed (Wilcox and 
Palermo, 1982). 


If the separate items of the BTBC are not all reliable tests of the concept each 
purports to measure, what does the total score reveal about a child’s present or 
future achievements? Attempts to identify the factors contributing to the BTBC 
score by use of principal components analysis were unfruitful. Other factor analytic 
studies have produced claims that the BTBC is single, general factor related to the 
acquisition of basic verbal concepts (Piersel and Reynolds, 1981), that its main 
components are perceptual-motor and verbal abilities (Brown, 1975), that it 
functions as a general estimate of cognitive development and receptive language 
abilities (Steinert, 1978). 


The present analysis of the BTBC items suggests that the following components 
contribute to the total BTBC score: 


(a) mature conceptual understanding, 

(b) partial conceptual understanding, 

(c) lexical knowledge, 

(d) facility with complex syntactic constructions, 

(e) auditory receptive and association skills, 

(f) visual receptive and association skills. 

(g) the abilities to quantify, compare, order and seriate, 
(h) reversible thinking, 


all of which could be considered to be components of general cognitive 
development. Some support for these conclusions is given by concurrent and 
predictive correlational evidence (Estes et al., 1976; Ernhardt et al., 1977; Smith, 
1977; Steinbauer and Heller, 1978; Piersel and McAndrews, 1982). Additional to the 
components suggested above are the hidden factors which allow some children to 
perform well in a test and place others at a disadvantage (Zigler and Butterfield, 
1968; Tizard, 1974; Wohlwill and Heft, 1977; Anastasi, 1981). 


In sum, the BTBC may be described as a multidimensional test which may be 
considered as much a measure of young children's general cognitive development as 
it is of their understanding of the Boehm concepts. Conceptual understanding is 
likely to be context bound to some degree until complete mastery has been achieved. 
A BTBC item should be viewed as an indicator of a child's understanding of the 
concept term in the context of the item as presented and not as an indication of 
concept mastery. 


Conceptual knowledge that is still incomplete may yet be adequate for 
understanding the directions encountered during the first years at school. 
Observation of teachers and examination of curricula materials suggest that mastery 
of all 50 concept terms is not essential for success in British first schools though most 
children are likely to encounter them, in part at least, before their eighth birthday. 
Therefore it can be concluded that the content of the BTBC (as presently adapted) 
can be considered relevant to first schools in England though some children may not 
be familiar with all the terms used. 


As a quick, first step in diagnosing levels of cognitive functioning the English 
standardisation and modification of the BTBC could be useful to researchers, 
educational psychologists and remedial teachers. 
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THE NEALE ANALYSIS OF READING ABILITY — REVISED 


By MARIE D. NEALE, М. Р. McKAY ann С. Н. CHILDS 
(Monash University, Victoria, Australia) 


SUMMARY. In 1980 steps were undertaken in Australia to commence a review of the Neale Analysis 
of Reading Ability (Neale, 1958) which culminated in a revision and restandardisation of the test in 
1984. This paper outlines the major features of this revision and presents data on the reading 
performance of boys and girls from the standardisation sample drawn from two Australian states, 
Victoria and South Australia. The results overall suggest that girls and boys do not differ 
significantly 1n performance at different age levels, except їп speed of reading (Rate) and word 
recognition skills (Accuracy), in their first year of schooling. It was also observed that differences 
exist between the original norms and the revised norms for Australian children and that continued 
use of the former would result in significantly biased estimation of the reading performance of 
Australian children. 


INTRODUCTION 

Since the Neale Analysis of Reading Ability was first published in 1958, more than 25 
years ago, it has become one of the most widely used tests of reading in the United Kingdom, 
Australia and New Zealand. The range of its usefulness and its versatility, from individual 
diagnostic work to large-scale research projects, is evident in the substantial body of studies 
that have used the Neale Analysis to measure reading performance e.g., in epidemiological 
studies (Rutter ef a/., 1970), surveys of reading standards (Bullock Report, 1975), differential 
diagnosis of reading performance (Yule, 1973, Yule ef al., 1982), experimental studies 
concerning different treatment and or teaching methods (Downing, 1965; Riding and Pugh, 
1977; Bradley and Bryant, 1981; le Coultre and Carroll, 1981; Dyson and Swinson, 1982), 
clinical studies (Stores, 1978; Chadwick et al., 1981; Rutter et al., 1980; Stedman and Van 
Heyningen, 1982), studies of reading disability (Hornsby and Miles, 1980; Badcock and 
Lovegrove, 1981; Lovegrove et al., 1982; Johnston et al., 1984; Hay ег al., 1984), 
developmental studies of children's reading errors (Harding ef а/., 1985), studies of informal 
reading analysis (Burke, 1977) and extensive studies in schools using it as a measure of reading 
achievement (Goodacre ef al., 1980). This record appears to vindicate the considerable 
groundwork involved in constructing the original measure to meet a theoretical position with 
specific criteria for a multifaceted appraisal of a child’s reading behaviour. 


At the time of the construction of the original Neale Analysis, i.e., during the early 1950s 
reading assessments tended to mirror teaching methods emphasising graded word reading lists 
(e.g., Vernon, 1938; Schonell, 1948), or the reading of isolated and unrelated sentences (e.g., 
Holborn Reading Scale: Watts, 1948) and single prose passages which covered too wide an age 
range for discriminative measures in accuracy and comprehension, (e.g. Gates, 1947; 
Schonell, 1950). 


Alternatives in the form of silent reading tests presented considerable difficulties for 
differential diagnosis. The position taken by Neale (1956) placed reading behaviour as a 
language skill, within the context of child development theory, and pre-dated the 
psycholinguistic approach to understanding the reading process. The emphasis of the Analysis 
was to move from a piecemeal approach to reading difficulties, to a more comprehensive view 
of the child’s total personality and performance. The earlier views of Huey (1908), on the 
psychological processes involved in reading, and the relationship between the dynamics of 
social behaviour and communication emphasised by Allport (1955), all influenced the features 
adopted by the author for the Neale Analysis (Neale, 1958). Accordingly, its style of assess- 
ment of reading emphasised a positive interaction between the examiner and the child. Its 
systematic categorisation of errors and difficulties encountered by the child when reading 
provided for a broader, more flexible style of appraising children’s reading. The title given to 
the instrument sums up the break with an older philosophy of testing, and highlights the role 
of an ''analysis" in making an informed, sensitive examination of an individual for 
subsequent tutoring. 
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However, the widespread interest in literacy of the last two decades has brought about the 
need to review reading tests in order to take into account changing standards in reading, 
children's contemporary interests and the increased knowledge of the reading process and its 
evaluation. À number of criticisms of the Neale have been made since its publication (Brimer, 
1958; Vernon, 1958; Goodacre et al., 1980; Vincent ef al., 1983; Pumfrey, 1984). Many have 
been minor in nature and apply equally to most other individual diagnostic tests. More 
important have been the criticisms directed towards the technical (constructional) properties 
and psychometric (scoring) properties of the test. At the time of publication i.e., 1958, such 
information was available, but was considered to be of minimal relevance to the majority of 
users at the time. Today, of course, the level of training of test users, particularly teachers, has 
become more sophisticated and minimal standards for test constructors have been prescribed. 
While any test author knows how demanding a task it might be, tests must be updated to 
ensure the most reliable and valid outcomes for users. 


In 1980 steps were undertaken to commence a review of the Neale Analysis, culminating 
in a major revision of the test which was completed in 1984. Pilot studies had indicated that 
the format and style of the Neale, with its high motivation for the child, contributed to its 
validity in measuring reading. The revision, therefore, focused on updating the narratives, 
providing new norms and new styles of reference standards, including more statistical data 
and more suggestions for the diagnostic use of the test. 


METHOD 

Main features of revision 

A major change in the revision resulted in the adoption of two standardised, parallel 
forms. These forms have been retitled A-R and C-R implying a revision based on the old form 
of that title. As with the old Neale Analysis, ‘‘reading ages’’ are supplied, but separate norms 
and associated confidence intervals are now available for each form. Additionally, new 
standard scores are provided in the form of age-based percentiles and stanines for both 
parallel forms with associated confidence intervals. Scaled Scores based on the Rasch scaling 
method are also provided for accuracy and Comprehension. A third form, The Diagnostic 
Tutor Form, is included as a closely related set of passages for the intensive investigation of an 
individual's reading behaviour under varying reading conditions (silent, oral, etc.), without 
experimenting upon, or distorting, the purpose of the standardised forms. This form has been 
extended by making available more passages at each grade level, and by adding a more 
extensive narrative at the lower secondary school level for assessing study skills and higher 
level comprehension. 


The two standardised forms retain the basic format of six graded passages of prose for 
each form, arranged as a "continuous" reading scale for children aged from 6 years to 11:11 
years (Neale, 1956, 1958). The revision, however, supplies data on the item difficulty of the 
passage content and the comprehension questions to meet the criterion of age differentiation. 
This information should go some way towards addressing the concerns of Goodacre et al., 
(1980) concerning the equivalence of the original two parallel forms. 


As in the original Neale Analysis, comprehension questions are asked following the oral 
reading of each passage, to test understanding of the main idea of the narrative, the sequence 
of events, details, and the ability to make some limited inferences. The term ‘‘errors’’ is 
retained to describe the reading inaccuracies which, in some current models of reading, are 
termed ‘‘miscues’’. These errors are used as a frequency count of difficulties which a child 
exhibits in his/her oral reading to obtain an ‘‘objective’’ measure of the accuracy with which a 
child recognises words. Unlike miscues, these errors are corrected orally by the examiner, 
thereby forming a prompt to the child to continue reading, with as much fluency as possible, 
beyond a level at which he/she might otherwise do so. An additional ‘‘informal’’ error 
analysis enables the examiner to analyse e qualitatively the errors which are semantically or 
syntactically acceptable. The use of the revised Neale Analysis in this informal manner should 
enable the teacher to obtain insights concerning the degree of proficiency with which the child 
uses the major cueing systems for reading. 


A procedural change has also been incorporated into the revised Neale to improve the 
validity of measures of comprehension. Non-scored practice passages are now administered to 
enable the child to anticipate what is expected of him/her in the ensuing passages. It is 
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expected that this procedure will reduce the emphasis on memory and enable the child to 
develop a ‘‘strategy for response” from the practice passage. Additionally, this procedure will 
enable the examiner to establish objectively the basal passage for beginning the formal 
assessment. 


Sampling 

Between 1980 and 1984 approximately 1,100 children from two Australian States, 
Victoria and South Australia, were selected as the sample for the restandardisation. The 
Victorian sample consisted of approximately 650 children drawn from 44 government schools. 
They were chosen by means of a stratified random sampling procedure based on socio- 
economic status of local government areas (LGAs) of Melbourne, derived from the 1976 
Commonwealth Census. Samples of schools in each stratum were drawn proportional to the 
size of the number of LGAs as designated in those strata. The testing was conducted by 44 
practising teachers enrolled in a post-graduate teacher training course, at Melbourne College 
of Advanced Education. 


Selection of individual children from each school was on the basis of birthdays which fell 
nearest to a set of randomly chosen calendar dates. Equal numbers of boys and girls from 
grade levels 1 to 6 were selected at each school. Approximately 25 per cent of these children 
came from homes where English was not the language of origin. These figures are consistent 
. with the contemporary ethnic mix in the Australian community. 


Two South Australian samples consisted of one drawn from 35 schools in 1981 and a 
further sample drawn from 23 schools in 1984. The testing in each case was conducted by final 
year pre-service student teachers and some in-service teachers enrolled together in a 
professional unit at the Salisbury Campus of the SACAE, Adelaide. In the 1981 sample it was 
not possible to use a stratified random sampling procedure for the whole of South Australia 
due to logistical constraints. An alternative sampling method was employed whereby a list of 
Schools accessible to the students was checked against the records of the Education 
Department of South Australia with respect to the proportion of children in each school in 
receipt of educational materials under the ‘‘Free Book’’ scheme. The statistic so formed was 
thus a sensitive and multi-faceted indicator of the prevalence of family and educational 
disadvantage in a specified locality. The second South Australian sample in 1984 was selected 
to provide a better overall balance of four identified strata in the South Australian community 
(Forster, 1984). 


The South Australian sample in toto thus consisted of 395 children spread evenly across 
grade levels 1-6, with equal numbers of boys and girls sampled on the basis of randomly 
chosen calendar dates, as for the Victorian sample. The sample appeared appropriate in its 
representation of rural communities, diverse urban strata, non-government schools, and 
variety of ethnic backgrounds. 


Testing procedure and materials 

Detailed training in test administration and scoring was given to data collectors in both 
States, by means of workshops and video presentations. These emphasised the distinction 
between testing children for research purposes under standard conditions as against testing 
children for diagnostic or evaluative purposes. 


Each child was administered two revised forms of the Neale (A-R and C-R or A-R and 
The Diagnostic Tutor), printed in an experimental edition similar to the original test booklet. 
Supplementary language and cognitive tests were also administered to provide information on 
concurrent validity. These included Schonell’s Graded Word Reading Test (Schonell and 
Goodacre, 1974), the Vocabulary and Similarities subtests of the WISC-R (Wechsler, 1974), 
and for the younger students the Articulation and Initial Sound subtests of the Neale Scales of 
Early Childhood Development (Neale, 1976). All tests were usually administered in a single 
session of approximately one hour. A second session was needed in a few instances due to 
interruptions from school timetabling or when the child's concentration was felt to have 
lapsed. Testing was completed within one or two days thereafter. 


Order effects for the two forms of the Neale were counterbalanced by having half of the 
testers administer the tests in reverse order. Schonell's Graded Word Reading Test and the 
WISC-R Vocabulary and Similarities subtests were then given subsequently in that order to all 
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children. Tape recorders were used to record children’s responses on the Neale, so that word 
recognition errors, verbatim comprehension answers, and time taken to read the passages 
could be rigorously transcribed. All information, including transcripts of comprehension 
answers, was collected for each child. Testing was conducted mainly in August/October (i.e., 
late Term II, early Term III in Australia) at a time when two full terms of instruction would 
have been completed. All testing was carried out in a quiet place in each child's home school. 


Scoring 

The raw scores for number of errors in word recognition, number of correct answers to 
the comprehension questions, and time taken to read aloud were recorded for each passage 
attempted by the child. A cut-off point of 25 errors was applied with respect to the new forms 
of the test instead of Neale's original cut-off criterion of 16 word recognition errors. It was 
reasoned that as there was no statistical evidence for believing the old and new forms to be 
equivalent, children should continue to read beyond the old criterion level in order to establish 
the progressive difficulty of the passages in each form. In practice it was found that this often 
put strain on the child to read beyond his/her actual capacity, but the data obtained 
reinforced Neale's original decision in 1956 to set the limit at 16 errors. Information was also 
sought separately on each child's developmental history, language background, and the 
methods being used to teach them reading. 


RESULTS AND DISCUSSION 
Age differences 
Distribution statistics for both States are reported in Tables 1 and 2. 


TABLE 1 


MEAN (CONVERTED RAW SCORES) AND STANDARD DEVIATIONS OBTAINED FOR RATE, ACCURACY AND 
COMPREHENSION ON BoTH REVISED FORMS OF THE NEALE ANALYSIS OF READING ABILITY 


VICTORIA 


























Form C-R 





Compre- 
hension 


Rate Accuracy Compre- 
hension 








BOYS N = 309 


60- 6:11 3 Q0 17 (9 6 (5 
7T.0- 7-11 39 QD 23 (4 8 (4 
8-0- 8:11 49 (24) 32 (23) п (8 
9-0- 9-11 62 (19 53 (23) 20 (11) 
10-0 - 10-11 з (8 6 (Q9 23 (9 
11:0 - 11:11 84 (31) 67 (21) 29 (1) 
12-0+ 99 (32) 72 (21) 28 (9 
GIRLS N = 312 

60- 6:11 35 Qn 21 Un 7 9 
T0- 7:11 48 (24) 29 (15) 10 (5) 
8:0- 8-11 58 (15 36 (15) 13 (6) 
9-0- 9-11 71 (33) 51 (24) 20 (D 
10-0 - 10-11 89 Q6 66 (22) 23 (9) 


11:0 - 11-11 
12:04 


87 (3) 72 QI) 26 (9) 
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TABLE 2 


MEAN (CONVERTED Raw SCORES) AND STANDARD DEVIATIONS OBTAINED FOR RATE, ACCURACY AND 
COMPREHENSION OF BOTH REVISED FORMS OF THE NEALE ANALYSIS OF READING ABILITY 






SOUTH AUSTRALIA 


Form A-R 








Accuracy 








BOYS N = 192 
6:0- 6-11 
7°0- 7.11 
8-0- 8:11 
9:0- 9-11 
10-0 - 10:11 79 
11:0- 11:11 84 
12-04 





11-0 - 11:11 
12:04 





(29) 
(27) 
NA 


Q7 


67 (19) 
66 (21) 
МА МА 
24 (13) 





Compre- 
hension 





SD 







Compre- 
hension 














25 
42 
54 
71 
(9) | 89 
(10) | 85 
МА | МА 
(4) | 41 








(23) 
(29) 
(26) 
(24) 
(27) 
(27) 
NA 


(29) 
(34) 
(37) 
(29) 
(29) 
(26) 
NA 


(16) 
(19) 
(17) 
(24) 
Q2) 
Q2) 
NA 


(13) 
(13) 
(22) 
(20) 
(23) 
(24) 
NA 


While it is not the intention of this paper to interpret critically these results some 
comments seem appropriate. 


One-way analysis of variance between States by age group were carried out for each 
subtest Rate, Accuracy and Comprehension and for each Form A-R and C-R separately. A 
summary of these results is reported in Table 3. 


TABLE 3 


A SUMMARY OF THE RESULTS OF ONE-Way ANOVAS BETWEEN THE STATES FOR RATE, ACCURACY 
AND COMPREHENSION ON THE REVISED NEALE ANALYSIS OF READING ABILITY 











Form A-R Form C-R 
Rate Accuracy Comp. Rate Accuracy Comp. 
6:0- 6:11 NS 0-05 NS NS NS NS 
7-0- 7-11 *0-01 0:0001 NS NS NS NS 
8:0- 8:11 NS 0-01 NS NS NS NS 
9:0- 9-11 NS 0-01 NS 0-05 0-05 NS 
10-0 - 10-11 NS NS NS NS NS NS 
110-11-11 М5 NS NS NS NS NS 


* Signifies significant F-value at indicated level. 
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It was reasoned that if real differences occurred between the States at different age levels, 
such differences should be evident across both parallel forms and generally across the three 
aspects of reading measures. However, this was not the case, with the ANOVAs revealing few 
differences between age groups for either Form A-R or C-R. There was some suggestion in the 
Form А-К results that a difference in the ‘‘Accuracy’’ of reading (i.e., word recognition 
skills) persists between States up to around 9-10 years of age, perhaps reflecting the influence 
of different instructional methodologies. Two-way ANOVAs for each group on the Accuracy 
scores, which included the interacting effects of grade, indicated significant between-state 
effects for age with grade, i.e., differences between children of a similar age occurred if such 
children were separated by grade levels. However, children of similar age and grade in each 
state did not differ significantly in performance. Given that children in Victoria complete a 
preparatory grade before entering Grade 1, compared to South Australian children, who have 
direct entry to Grade 1, this result is not surprising. Children in Victoria were 10 months older 
on average at any given grade level, compared to the South Australian sample. It is of 
considerable worth that the Neale Analysis was capable of reflecting these developmental 
differences since such age differentiation is an important part of the construct validity of such 
a test (Anastasi, 1976). Overall these results suggest little in the way of stable differences 
between the States. No differences in reading Comprehension emerged at all. 


Sex differences 

The general trend evident in these data, for both States separately, and in the combined 
data, was for girls to score slightly higher than boys at most age levels on all indices of reading 
— Rate (words read per minute), Accuracy (accuracy of word recognition) and 
Comprehension (number of questions answered correctly). This slight edge that girls seemed 
to have over boys was more evident in the youngest age groups. 


In order to explore the significance of these observed differences between boys and girls, 
t-tests for independent samples were carried out on both forms of the revised Neale for each 
separate age group. Tables 4 and 5 report a summary of the outcome of these analyses for each 
State separately. 


TABLE 4 


SUMMARY OF T-TEST RESULTS SEX DIFFERENCES IN PERFORMANCE ON 
REVISED NEALE ANALYSIS OF READING ABILITY BY AGE: 








ForM A-R 

VICTORIA 
Age Rate Accuracy Comprehension 
6:0- 6-11 <0-01 < 0-05 М5 
7°0- 7-11 NS NS NS 
8-0- 8-11 NS NS NS 
9-0- 9-11 NS NS NS 
10-0 - 10-11 NS NS NS 
11:0- 11-11 NS NS NS 








Age Rate Accuracy Comprehension 
6:0- 6:11 < 0-05 < 0°05 NS 
7:0- 7-11 NS NS NS 
8-0- 8-11 NS NS NS 
9:0- 9-11 NS NS NS 
10-0 - 10-11 NS NS NS 


11-0- 11-11 NS NS NS 
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TABLE 5 


SUMMARY OF T-TEST RESULTS: SEX-DIFFERENCES IN PERFORMANCE 
ON REVISED NEALE ANALYSIS OF READING ABILITY BY AGE: 





Form C-R 
VICTORIA 
Age Rate Accuracy Comprehension 
6:0- 6:11 NS NS NS 
7-0- 7:11 NS NS NS 
8-0- 8-11 NS NS NS 
9-0- 9-11 NS NS NS 
10-0 - 10-11 NS NS NS 
11-0- 11-11 NS < 0-05 NS 











Age Rate Accuracy Comprehension 
6:0- 6:11 « 0-01 « 0-05 NS 
7:0- 7.11 NS NS NS 
8-0- 8-11 NS NS NS 
9:0- 9-11 NS NS NS 
10-0 - 10-11 NS NS NS 
11:0 - 11:11 NS NS NS 





In Victoria girls were significantly faster than boys at reading and more accurate at the 
6:0-6:11 year-old level on Form A-R. All other comparisons between boys and girls at each 
age level on this revised form, for Rate, Accuracy and Comprehension of reading, indicated 
no significant differences. Examination of Form C-R results revealed that the only significant 
difference between the sexes in Victoria occurred for Accuracy of reading at the 11:0-11:11 
year-old level. 


A pattern of differences similar to Victoria also emerged between South Australian boys 
and girls at the 6:0-6:11 year-old level on Form A-R, with girls reading faster and more 
accurately. No significant differences in comprehension occurred on this form of the test for 
any of the age groups. However, on Form C-R for the South Australian children, a similar 
pattern of differences occurred between boys and girls as that obtained on Form A-R, with 
girls reading faster and more accurately than boys, at 6:0-6:11 years. Since this was virtually a 
replication of the results in Victoria for Form A-R it could be considered that a real difference 
between the sexes, favouring girls, occurred at the 6-7 year-old-level in speed of reading and 
word recognition skills. However, no significant differences occurred in Comprehension at 
this age level, or at any other age level on either form of the revised test. 


It would be difficult, therefore, to characterise the differences between boys and girls at 
this early age level as a difference in reading ability per se. The possibility of an inherent sex 
bias in both forms of the test, which favours girls in the earlier passages, was considered. 
However, this seemed highly improbable upon inspection of the content of the two relevant 
passages for each form, which deal respectively with — ‘‘Cats and kittens!’’, “А surprise 
parcell’’, “А bird in garden!" and “Ап incident with bicycles!’’. A more likely possibility is 
that this outcome reflected the often observed phenomenon of the early linguistic fluency of 
girls, and may also have reflected their early responsiveness to teaching, noted in some studies 
(e.g., Thompson, 1975). 


Finally, a comparison was also made of the results of the Australian normative sample, 
using the old UK norms and the new norms for Form A-R. Form A has been used most 
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extensively in research and has achieved a high level of acceptability for its reliability. Netley et 
al. (1965) reported Form A of the Neale as the most reliable of the three forms (test-retest 
reliability) and recommend it as a generally reliable test of reading skills. Tables 6 and 7 
provide comparative results for the Australian standardisation sample for Form A-R using the 
old UK norms and the newly derived norms. 


TABLE 6 
READING AGES — COMPARATIVE RESULTS FOR NEALE ANALYSIS OF READING ABILITY — ACCURACY 





Years of Months of Reading Age 
Reading 0 1 2 3 4 5 6 7 8 9 10 11 
Age 
0. UK 
5 Form A-R — mma] 2-3 45 6 7-8 9 10 
0. UK 1 2 3 4 5-6 7 8 910 11 


6 Form A-R п 12 13 14 15 16 17 18 19 0 2! 22 


0. ОК | 12-13 14 15-16 17 18 19-20 21 2 23 24 25 26-27 
7 Form A-R 23 24 25 26 27 28 29 30 31 32 


0. UK 28 29 30 3l 32 33 34-35 36-37 38 39-40 41-42 43 
8 Form A-R 33 34 35 36 37 38 39 40 4 42 43 


0. UK | 4445 46-47 48 49-50 51-52 53 54 55 56 57 58 59-60 
9 Form А-К 44 45 46 47 48 49 50 51 52 53 54 





0. ОК 61 62 63 64 65 66 67 68 69 70 71-72 73 
10 Form А-К 55 56 57 58 59 6 








78 79 80 81 82 83-84 85 86 87 
69 70 71 72 73 74 75 


0. UK 74 75-76 
п Form A-R 67 


2 3 


0. UK = Original UK norms 


It сап be seen for Accuracy, that the old UK norms would overestimate reading ages for 
Australian children by up to 12 months of reading age at the lowest age levels i.e., up to 9 
years old, and underestimate by up to 10 months the performances of older children. 
Comprehension estimates, on the other hand, do not vary greatly between 7 years and 8:7 
years of age. The old norms would have overestimated reading age for this aspect of reading 
ability below 7 years and underestimated comprehension results by up to 11 months above 9 
years of age. 


It is difficult to indicate the significance of these different estimates in the absence of 
confidence intervals concerning the original norms. However, confidence intervals supplied 
with the revised Neale indicate that differences greater than 8 months of reading age for Rate, 
4 months for Accuracy and 5 months for Comprehension respectively, would constitute 
significant variation in reading results. Using these criteria, it would appear that significant 
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TABLE 7 


READING AGES — COMPARATIVE RESULTS FOR NEALE ANALYSIS OF READING ABILITY — 
COMPREHENSION 






Months of Reading Age 
0 1 2 3 4 5 6 7 8 9 10 11 





























0. UK = Onginal UK norms 


differences do exist between the two sets of norms for Australian children and that continued 
use of the old norms would result in biased estimates of reading ability for this group. 


A further implication of the above results arises in relation to diagnosis of reading 
disability. A number of authors (e.g., Yule, 1973) have argued for a definition of reading 
retardation based on the degree of discrepancy between ‘‘reading age" and chronological age 
after controlling for intelligence. Obviously such a definition depends on what is the practical, 
functional meaning of having a reading age less than one's chronological age. At least one 
vital parameter of such a definition concerns the accuracy of the "'reading аде”, obtained, 
which in turn is dependent on the validity of the norms of the test used. These findings suggest 
that the use of the old norms for the Neale would significantly underestimate or 
overestimate reading ages for contemporary Australian children at different age levels. 
Caution, therefore, must be observed in estimating the seriousness of any obtained variation 
from normative standards alone. The present data would suggest that continued use of the old 
norms of the Neale for Australian school children would result in the possibility of incorrect 
classification of children as ‘‘backward’’ readers. 
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CONCLUSION 


This paper has outlined the major features of the revised Neale Analysis and presented 
data on the reading performance of boys and girls from the standardisation sample. 


The comparison of the reading results of Victorian and South Australian children in the 
present paper highlights the importance of obtaining up-to-date normative and local data for 
gauging standards of reading. The contrasting of results using the old UK norms and the 
contemporary norms raises a number of points of interest. Comparison of reading ages 
derived from the old norms with those derived from the revised norms suggests that the new 
norms for Australian children at the upper range of primary school are now closer to age 
expectancy than the old ones for all indices of reading. Continued use of the old norms would 
thus significantly underestimate Australian children’s reading performance at these levels. It is 
worth noting in this regard that Andrews and Elkins (1971) recommended a revision of the 
norms for Accuracy and Comprehension on the old Form A in line with the present findings, 
based on data from 250 children tested on the old form in Queensland. 


Overall, the revised Neale Analysis has retained the features which have proven it to be a 
reliable and valid instrument for reading research and assessment. At the same time it has 
incorporated significant modifications which are designed to meet, on the one hand, more 
contemporary demands for supplementary technical information and, on the other, current 
trends in education for informal, flexible approaches to the diagnosis of reading ability. 


Requests for reprints should be addressed to Professor Marie D. Neale, Dinah and Henry Krongold 
Centre for Exceptional Children, Monash University, Clayton, Victoria, Australia 3168. 
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APPROACHES TO STUDYING BY FILIPINO STUDENTS: A 
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SUMMARY. The responses of 212 male and 213 female Filipino students to the short version of 
Entwistle's Approaches to Studying Inventory (ASI-S), measures of self-esteem, locus of control, 
and field-independence were obtained at two testing sessions six months apart. A comparison of the 
factor structure of the responses to the ASI-S at both points in time confirmed the relative stability 
of the underlying dimensions. A multivariate analysis of variance indicated a main effect over time 
(i.e., that the student's approach to learning had changed significantly) but these changes were not 
related to sex, IQ, school grade, self-esteem, locus of control, or field independence. It appears that 
as these students progressed through secondary school many of them came to rely less on rote 
learning methods Pur this change was not related to the personological variables examined here. 


INTRODUCTION 
The last 10 years have seen the emergence of an approach to the study of human learning 
which focuses on relating the students’ motivations and conceptions of their own learning to 
the learning strategy they adopt. One of the strengths of this approach has been the relative 
convergence of findings from distinctive research traditions: the qualitative as exemplified by 
Marton and his colleagues in Sweden and the quantitative as exemplified by Entwistle and 
others in the United Kingdom (Marton and Saljó, 1976; Entwistle et al., 1979). 


Two major deficiencies of research in this field to date have been the relative lack of 
longitudinal studies and investigations with students from third world countries. Longitudinal 
studies are important as they can contribute knowledge as to whether approaches to study are 
consistent over time (still a controversial issue in this area). Such knowledge may lead to better 
understanding both of the way that students change their learning strategies during their 
lengthy progress through our educational systems and of ways of encouraging the adoption of 
strategies which lead to higher level learning outcomes. The increasing importance of 
education in developing countries and the increasing numbers of students from such countries 
studying at our educational institutions (a number of whom seem to experience learning 
difficulties, e.g., Eng and Manthei, 1984) would seem to highlight the need to establish the 
cross-cultural validity of Marton and Entwistle’s work. 


Previous research from within the quantitative tradition has confirmed the stability but 
questioned the precise nature of the factor structure underlying Entwistle’s Approaches to 
Studying Inventory (ASI). Further, these studies have failed to demonstrate that the 
superficial approach to studying utilised by all too many Australian students at the start of 
their tertiary education underwent any improvement during the course of their studies 
(Watkins and Hattie, 1985). The writers have also confirmed factor analytically the 
motive/strategy model supposedly underlying the Study Process Questionnaire (Biggs, 1979) 
for Australian but not Filipino students (Hattie and Watkins, 1981; Watkins and Astilla, 
1982). However, deep level rather than surface level learning strategies were found to be more 
ia to academic success at Filipino schools (Watkins and Astilla, 1982; Watkins, 
1984). 


The purpose of the research reported here was to investigate the stability of the approach 
to school learning of Filipino students as measured by the short version of the Approaches to 
Studying Inventory, ASI-S (Entwistle, 1981). It is thus relevant both to the development of the 
learning processes of these students and to the validity of the ASI-S itself. A range of 
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personological variables was also examined as possible moderators of any changes in student 
learning processes over time. 


METHOD 


Subjects 

The subjects in the first stage of this research were 220 males and 225 females who were in 
their second year of study at a prestigious private school in the central Philippines. The great 
majority were 13 to 14 years of age and came from upper class or professional families. 
English was the language of instruction at this school. At the second testing, six months later, 
data were provided by 212 males and 213 females from the original sample. 


Instruments 

The ASI-S is a 30-item, 7-scale questionnaire (see Table 1 for a description of the scales). 
Analysis of responses to the ASI-S at the first testing indicated rather low internal consistency 
coefficients for each of the scales — only to be expected given the shortness of these scales. 
However, the factor structure was much as intended by Entwistle (see Table 2). 


TABLE 1 
NAMES AND DESCRIPTIONS OF CONTENT OF SCALES OF 
* APPROACHES TO STUDYING INVENTORY” (SHORT VERSION) 





Name of Scales Description of content of scales 
Achieving orientation Indicates well-organised study methods and achievement motivation 
(6 items). 


Reproducing orientation Assesses tendency to rote learn and be syllabus bound. Motivation 1s 
extrinsic (6 items). 


Meaning orientation Emphasis on a deep approach to learning. Motivation is intrinsic and 
achievement oriented (6 items). 


Comprehension learning peel a tendency to seek broad understanding from the beginning 
items). 


Operation learning Preference for building understanding based on logic rather than intuition 
(3 items). 


Improvidence A learning pathology characterised by overreliance on details and failure to 
develop overall understanding (3 items). 


Globetrotting Assesses tendency to jump to conclusions without adequate factual basis 
(3 items). 





In addition to the ASI-S, questionnaires designed to measure the following variables were 
used: academic locus of control (Crandall’s, 1978, short version of the ‘‘Intellectual 
Achievement Responsibility Scale"); quality of family relationships (Watkins and Astilla, 
1980); field independence (Fitzgerald and Hunt, 1977), and attitudes to school (Watkins and 
Sampson, 1974). The reliability and validity of the first two of those instruments for Filipino 
children have been reported elsewhere (Watkins, 1982) and the latter measures have proved 
their adequacy in several Filipino and Australian studies. A 10-item ‘‘true-false’’ question- 
naire was specially designed to measure academic self-esteem. A typical item was ‘1 get good 
marks in most school subjects’’. This inventory was found to have an internal consistency 
reliability coefficient a of 0-53 — which is probably acceptable for exploratory purposes. All 
items were carefully assessed for intelligibility and relevance to Filipino children by three 
teachers who were familiar with this area of the Philippines. Details of student's age, father's 
and шок level of education, IQs, and academic grades were obtained from the school 
records. 


The data from the first testing indicated that a deep level and a competitive well- 
organised approach are positively related to academic success. However, a tendency towards 
reproducing information or the pathology of globetrotting was negatively correlated with 
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school grades. In addition, being female, having higher academic self-esteem and an internal 
locus of control for success but not failure were found to be significantly related to a deep- 
level approach to learning. (Watkins, 1984, see also Table 2.) 


RESULTS 


The responses to the ASI-S on both occasions were subjected to exploratory common 
factor analysis (maximum-likelihood method), followed by orthogonal and oblique rotations 
to simple structure. Two clear factors were obtained on both occasions and there was a fairly 
close resemblance between the factors over time — the congruence coefficients being 0-85 
(Factor I) and 0-95 (Factor I). The ASI-S scale factor loadings from the second testing 
Varimax solution are shown in Table 3. As Entwistle had predicted, Factor I had high loadings 
from the Achieving Orientation, Meaning Orientation, Comprehension Learning and 
Operation Learning scales, while Factor II had high loadings from the other ASI-S scales. 


Confirmatory factor analysis (cf. Watkins and Hattie, 1981) was used to test whether the 
factor structure that would be expected of the ASI-S according to Entwistle was in fact found 
on both testing occasions. The indices of model adequacy obtained are shown in Table 4 and 
can be seen to demonstrate quite a good fit between the hypothesised model and both sets of 
data. 





TABLE 3 
VARIMAX FACTOR LOADINGS OF ASI-S SCALES (SECOND TESTING) 
ASI-S Scales „ Factor I Factor П 
Achieving orientation 0-65 0:27 
Reproducing orientation 0:16 0-75 
Meaning orientation 0-67 0-00 
Comprehension learning 0-69 0-18 
Operation learning 0:57 0:09 
Improvidence 0:14 0:56 
Globetrotting 0.05 0-52 
TABLE 4 


INDICES OF ADEQUACY OF FIT OF THE HYPOTHESISED FACTOR MODEL TO THE DATA ON BOTH OCCASIONS 





No. of 
residuals 
x df РА >.10 
First Testing 12°51 8 1:13 0 
Second Testing 11:46 8 0-93 0 





The means of the ASI-S scales on the first and second testings for both male and female 
subjects are shown in Table 5. As in the first testing (Watkins, 1984), the females were 
significantly (P « 0-01) more likely than the males to have scored highly on the meaning 
orientation and comprehension learning scales but less highly on the reproducing orientation 
and globetrotting scales. 


These data were analysed by means of a series of repeated measures multivariate analyses 
of variance (MANOVA) using the SPSS-X program (Hull and Nie, 1984). The results of these 
MANOVAs were quite consistent. In each case the main effect of ‘‘time’’ was significant at 
the 0-01 level but there was no significant interaction with variables such as sex, self-concept, 
locus of control, IQ, grades, or field independence. 
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TABLE 5 
MEANS OF ASI-S SCALES FROM FIRST AND SECOND TESTING FOR MALES AND FEMALES 





Xs from First Testing Xs from Second Testing 
ASI-S Scales Male Female Male Female 
Achieving orientation 17-42 17:81 17-23 17:11 
Reproducing orientation 16-19 14-52 15:36 13-43 
Meaning orientation 17:07 18-37 17-07 17:87 
Comprehension learning 8:32 9-02 8:16 8-97 
Operation learning 8-12 8:23 8:28 8:26 
Improvidence 7-80 7:74 7-72 7-13 
Globetrotting 6-42 5-68 6°50 5:27 





Univariate analysis indicates that it is only the ‘‘reproducing orientation" which is 
contributing significantly (P < 0-01) to the main effect over time. The means of this variable 
decline over time. 


DISCUSSION 


The factor analysis presented here both supports the construct validity of the ASI-S for 
use with Filipino subjects and the stability of the underlying factor structure. The 
confirmatory method has demonstrated that the data obtained on both occasions adequately 
fit Entwistle’s model. Factor I which essentially combines the ‘‘achieving orientation", 
“comprehension learning”, ‘‘meaning orientation’ and ‘‘operation learning" scales, repre- 
sents, according to Entwistle, versatile learning in the sense of Pask (1976) plus organised 
study methods. High scorers on this factor would be expected to do well academically. Factor 
П, which has high loadings on the ''reproducing orientation", ‘‘globe-trotting’’ and 
‘‘improvidence’’ scales, Entwistle describes as an index of pathological symptoms. 


The repeated measures analyses indicated that a significant change had occurred in this 
group of students’ study methods over the six-month period since the first testing. The change 
that took place was largely attributable to a lowering in the ‘‘reproducing orientation’’ scores 
over time which suggests that the students were gradually becoming less reliant on rote 
learning — surely a trend in the right direction. 


The male students were still tending to utilise less admirable study methods than the 
females as found on the first testing. Indeed, the change in study methods that did occur was 
not influenced by such factors as the students’ self-esteem, locus of control, grades, IQ, or 
field independence. 

CONCLUSIONS 


This research has gone some way to demonstrating the factorial validity of the short 
version of Entwistle’s Approaches to Studying Inventory for use in a third-world setting such 
as the Philippines and the stability of the structure of learning processes over time. It also 
appears that during the time since the first testing the students tended to become less reliant on 
superficial study methods. Taken together with the previous finding, confirmed again on this 
occasion, that a deep level approach but not rote-learning or the tendency to jump to 
premature conclusions was likely to lead to academic success at this school, the data indicate 
that the school’s current reward system in terms of academic grades is conducive to good 
study methods. 


Requests for reprints should be addressed to Dr. D. Watkins, Education Department, University of 
Canterbury, Christchurch 1, New Zealand. 
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THE EFFECTIVENESS OF THE ‘TUFTY CLUB" IN ROAD SAFETY 
EDUCATION 


Bv C. ANTAKI, P. E. MORRIS лм В. M. FLUDE 
(Department of Psychology, University of Lancaster) 


SUMMARY. Tufty Clubs are the main educational intervention made by road safety officers to 
educate primary school children in the dangers of pedestrian accidents. The effectiveness of Tufty 
Clubs was assessed by testing 186 5-year-old children on road safety knowledge both in the middle 
of their first school year and again at its end. Children from schools with Tufty Clubs were no better 
at either test, nor was there any sign of an interaction that would have indicated that the Tufty Clubs 
selectively improved knowledge during the year. 


INTRODUCTION 


Almost 20 per cent of all deaths among children in the age range 5-9 years are the result of 
pedestrian accidents and children in this age range have five times the rate of accidents 
compared with the adult population (Foot et al., 1981; Chapman ef al., 1982). 


The main educational intervention made by road safety officers employed by local 
authorities in an effort to reduce these figures is through the Tufty Club. The Tufty Club was 
founded in 1961 by the Royal Society for the Prevention of Accidents (RoSPA). Tufty is a 
squirrel who, with his animal friends, features in road safety stories, songs, colouring sheets 
and other activities. Many primary schools run Tufty Clubs at which there are regular 
activities based upon Tufty Club material, although the extent of use of Tufty Club material 
varies between schools. 


Do Tufty Clubs help train children in a knowledge of the dangers of the road? Certainly 
head teachers of primary and middle schools, playgroup leaders and road safety officers think 
that they do (Firth, 1974), and they are popular with the children themselves (Firth, 1973). It 
would be heartening for the teaching staff, parents and road safety officers who put in time to 
make the Tufty Clubs function if they knew that their efforts were improving the road safety 
knowledge of the children. Unfortunately, the small amount of published research on the 
effectiveness of Tufty Clubs is not very encouraging. Colborne (1971) found that preschool 
children could criticise Tufty's action in running into the road in a given picture, but complete 
understanding of the message intended was rare. Firth (1973) compared children's responses 
before and after a Tufty book was read to them by their mothers. There was no significant 
difference on a behavioural test using a model, although the road crossing knowledge shown 
by the children did improve following the reading of the book. 


The study reported here was designed as a longer term assessment of the effectiveness of 
Tufty Clubs. Children from schools that did and did not have Tufty Clubs were tested twice 
during their first school year: once in the middle of the year and once at the end. The tests 
were of two types. One involved the children identifying proper and ‘‘naughty’’ behaviour 
seen on a video tape of people using a busy pelican crossing. The second involved answering 
questions about pictures depicting road scenes. The children were tested twice to assess the 
development of the children's road safety knowledge during a half year. It was hoped that the 
Tufty Clubs would produce greater improvement for children from schools with such clubs. 


It is possible that there are factors which lead some schools to have Tufty Clubs and 
others not to do so. These factors might influence the level of knowledge of children 
independently of the influence of the Tufty Club itself. One advantage of the present design is 
that if there was an interaction between the first and second testing of the children, with the 
Tufty Club children doing proportionally better on the second test, this would strengthen the 
evidence that the Tufty Club was contributing to knowledge. A superiority of Tufty Club 
children on one or both tests could be more easily dismissed because of possible differences 
between schools. 


The tests used were of verbalisable knowledge, rather than actual behaviour. It is possible 
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that a knowledge of the dangers and necessary skills of road safety will not generalise from the 
type of tests that we used to behaviour on the road. This distinction between knowledge and 
behaviour has been known at least since Belbin (1956a, 1956b). Belbin found that there was no 
relationship between children’s recall and recognition of specific pieces of road-safety 
information on the one hand and actual road behaviour on the other. 


However, it is possible that, even if there is no relationship between knowledge of specific 
pieces of road safety advice on the one hand and behaviour on the other, there is at least a 
relationship between amount of exposure to systematic road-safety awareness training (as 
evidenced by Tufty Club membership) and general verbalised awareness of correct and 
incorrect road behaviour. This is what we set out to test. A child with a good knowledge of 
what they ought to do may not always put it into practice, but a child who does not know what 
they should do, or why, is in even greater danger. 


METHOD 

Test construction 

Two subtests were constructed. One (video) showed people at a busy pelican crossing. 
The tape was edited to show proper, improper and ambiguous use of the crossing. Children 
could score up to 23 points by identifying ‘‘naughty’’ behaviour (e.g., anticipating the signal) 
and expressing a grasp of why it was improper (e.g., by saying ‘‘the cars haven't stopped"). 
The second subtest (cards) presented children with sets of cards depicting proper and improper 
roadside behaviour. Children could score up to 17 points identifying the proper card in a set 
(e.g., child depicted holding a ball in a bag while walking on the pavement) and expressing a 
grasp of why other pictures were unsafe (e.g., ‘‘because if you bounce the ball it might go in 
the road"). 


The tests were constructed in collaboration with the Lancaster and District Road Safety 
Officers. 


Design 

All 59 primary schools in Lancashire District 1 were contacted, and 31 agreed to take part 
in the research, of which 13 used Tufty Club material (‘‘Tufty schools") and 18 did not 
(‘‘Non-Tufty schools"). 


Within each school, three boys and three girls were seen in the middle of the school year 
(February) and again at the end (July). The 186 children tested were the ones who were aged 
exactly 5 years 6 months, or as close as possible at the time of the first test. This procedure was 
adopted to make selection as automatic as possible to avoid teacher biases. 


Administration 

Each child was seen individually in a quiet room for approximately 15 minutes by 
the third author. Order of video and card tests was counterbalanced. In approximately half 
the nr and half the non-Tufty schools the video test was given first at Time 1 and second at 
Time 2. 


Attrition rate 

Of the 186 children seen at Time 1, 12 (6-4 per cent) were not available for test at Time 2 
owing to absence from school on the test day. This left 71 children in the **Tufty" group and 
103 children in the ‘‘Non-Tufty’’ group. 


RESULTS 
Using the Spearman-Brown formula to determine reliability from a split half test, the 
correlation between the video and card sub-tests at Time 1 was 0-56 and at Time 2 0-59. The 
test-retest correlation of the total test scores at Time 1 and Time 2 was 0:53. These 
correlations indicate a reasonable, if not high, degree of reliability in the test's construction. 


The main analysis was of the children's scores in a four-factor design (Tufty exposure x 
Test order x Child's sex x Time of test). The only significant effect was the main effect of 
Time (F 1, 166 — 95-94, P « 0-00001). There was a marked improvement in scores on the 
second administration of the test. This improvement was not significantly different for both 
*"Tufty"' and '*Non-Tufty" children: the Е value for the interaction was Е 1, 166 = 1:12, P» 
0-05. Table 1 shows the appropriate means. 
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TABLE 1 


MEAN PEDESTRIAN ROAD-KNOWLEDGE TEST SCORES OF CHILDREN IN ‘‘TUFTY’’ AND ‘‘NON-TUFTY”’ 
SCHOOLS AT Two TIME PERIODS THE MAXIMUM POSSIBLE SCORE WAS 40. STANDARD DEVIATIONS IN 








BRACKETS 
Time 1 Time 2 
(2nd school term) (3rd school term) 

“Tufty” 
children 18:7 23-0 

(4:6) (4-7) 
“Non-Tufty” 
children 19:2 22:7 

(4-9) (6:2) 

DISCUSSION 


As the analysis indicated, the children improved markedly in the six months between the 
two tests. No knowledge of results was given at the first test, and the interval between test and 
retest is a long one for children in their first year at school. It is therefore unlikely that the 
improvement reflects learning during the original test. The children do appear to be improving 
their knowledge of correct pedestrian behaviour during their first school year (though, as we 
noted in the introduction, this may have no obvious relation to their behaviour). However, the 
comparisons involving the Tufty Clubs are disappointing. It is, of course, possible that the 
children’s behaviour has improved through their Tufty Club experience, even though their 
knowledge of road safety has not been affected. Nevertheless, there is no evidence, either in 
the overall performance of Tufty Club schools or in the interaction between tests 1 and 2 that 
the regular experience of a Tufty Club has contributed anything to the children’s knowledge 
of road safety. Given the input of time and money involved in these clubs this finding is 
important. It reveals the need to evaluate techniques for teaching road safety so that suitable 
methods can be identified, and the need, given the present results, for further consideration of 
suitable ways of teaching road safety knowledge quickly and effectively to young children. 
The children’s performance in the present study showed that they had considerable learning 
before them, since they made many incorrect responses. Clearly, if the numbers of young 
children who are killed or seriously injured each year is to be reduced then new methods of 
teaching road safety need to be devised and tested. 
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SEX AND PERSONALITY DIFFERENCES IN SECOND LANGUAGE 
PERFORMANCE IN SECONDARY SCHOOL PUPILS 


By R. J. RIDING anp С. E. BANNER 
(Department of Educational Psychology, University of Birmingham) 


SUMMARY. Two experiments investigated the relationship between sex and personality and second 
language performance in secondary school pupils. In the first, 56 13-14-year-old children were given 
the Junior Eysenck Personality Inventory, Raven’s Standard Matrices and three French 
performance tests (comprehension, essay and prose translation). Two significant effects were 
found: (a) The performance of girls was superior to boys; (b) there was an interaction between 
Raven’s matrices, extraversion and test type such that for extraverts, high Raven’s subjects were 
superior to low Raven’s on the essay and translation tasks which required a large amount of 
translation from English to French, but not on comprehension, while for introverts the pattern was 
reversed. In the second experiment 48 12-13-year-old children were given the Junior Eysenck 
Personality Inventory, Raven's Standard Matrices and two French performance tests 
(comprehension and essay) which were both scored according to ability to use singular and plural, 
set phrases and verbs. Again girls were superior to boys. There was also a significant interaction 
between Ravens, extraversion and skill type. 


INTRODUCTION 
The aim of this paper is to explore individual variation in second language learning. Two 
studies investigated the possible relationships between sex, personality and French 
performance by British children. 


EXPERIMENT ONE 
METHOD 
Ат 
The aim of the experiment was to investigate the relationship between extraversion, sex, 
Raven’s Matrices scores and performance on French tests by children for whom it was a 
second nae The extent to which the tests required translation from English to French 
was varied. 


Subjects 
56 13-14-year-old children whose native language was English were drawn from all 83 
children in the top band of a mixed comprehensive school in the manner described below. 


Materials and procedure 

The Junior Eysenck Personality Inventory (S. Eysenck, 1965) was used to assess 
introversion-extraversion, and Raven’s Standard Progressive Matrices (Raven, 1956) to assess 
reasoning. 


Language performance was assessed by means of a one-hour printed paper given as part 
of the school’s usual annual examinations and containing three language tests as follows. (1) 
French comprehension. This consisted of a 208-word passage and 12 questions, both in 
French, to be answered by sentences in French. Two marks were available for each question: 
one mark was awarded for correct comprehension and one mark for correct French with 0-5 
mark being deducted from the latter for each error. The maximum score was 24. (ii} Essay. 
The essay was to be written in French and based on a sequence of six pictures. One point was 
deducted from 30 for each French inaccuracy except for accents where 0:5 mark was 
deducted. A mark out of ten was then added for the quality of the content. The maximum 
score was 40. (iii) Prose Translation. A 106-word English prose passage was to be translated 
into French. For scoring the passage was divided into 25 phrases with one mark for each one 
correctly translated. 


In the four weeks before the examination Raven’s Matrices and the Junior Eysenck 
Personality Inventory were given to the class groups. The children were then allocated within 
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sexes into high and low groups for both Raven’s Matrices and Extraversion. This gave eight 
groups with seven subjects in each. 


RESULTS 
The mean scores for the treatments are given in Table 1. 


TABLE 1 
MEAN SCORES ON LANGUAGE TESTS IN EXPERIMENT 1 (STANDARD DEVIATION IN BRACKETS) 


Raven's | Extraversion 





Mean score on language tests (98) 














Comprehension Essay Translation 






























Boys| High Extraverts 46-85 (9:24) (15:37) 42:28 (14-47) 
Introverts 38-28 (14-79) 22-88 (7:91) 17°14 (10-84) 
Extraverts 41.42 (9-66) 31:42 (10-34) 30-28 (12:62) 





Introverts 34-85 (13-47) 


53-71 (10-87) 
54-85 (8-47) 
56:57 (6:56) 
45-14 (11-65) 
50-28 
46-57 
3-71 


36°57 (20-55) 29°71 (28:11) 





















Girls 







Extraverts 
Introverts 
Extraverts 
Introverts 


(13-38) 
50-85 (6-56) 
54-85 (9:49) 
(12-82) 


45-1 (21:25) 
41-14 (28-01) 
36-57 (26-39) 
36.57 (21:74) 


44-00 
29-14 










Both 





Extraverts 
Introverts 
Difference 












Extraverts 
Introverts 
Difference 


33:14 






А four-way analysis of variance of Extraversion х Raven’s х Sex with repeated 
measures on Language Sub-test Type was performed on the data. This showed two significant 
effects: (a) a significant difference in performance between the boys and girls (Е = 8:41; df 
1,48; P < 0-01); (b) an interaction between Raven’s Matrices, Extraversion and Language 
Sub-test Type in their effect on performance (F = 3-62; df 2,96; P < 0-05). These two 
findings will be considered in turn: 


(a) Sex difference. The mean score obtained by the girls was higher than that for the boys 
on all three tasks. The overall means were, 47:12 per cent for girls and 35:57 per cent for 
boys. The superiority of the girls is consistent with findings of other studies (e.g., Nisbet and 
Welsh, 1973; Powell, 1979; DES HMI Report, 1985). Since there were no significant 
interactions between sex and the other variables, no additional evidence about the nature of 
sex differences was available. 


(b) Extraversion, Raven's Matrices and test-type. Inspection of the lower section of 
Table 1 indicates that, for a given level of Raven's, extraverts are superior to introverts on all 
tasks. For the high Raven's group, there was little difference between extraverts and introverts 
on comprehension but the difference increased as the shift to translation from English to 
French increased. In the case of the low Raven’s group there was a difference in favour of the 
extraverts on comprehension but this decreased as the amount of English to French 
translation increased. 


. EXPERIMENT TWO 
Aims 


The aim of the second experiment was the exploration of the interaction of second 
language performance in three skill areas (rules, set phrases and verbs) and task type 
(comprehension and essay) with Raven's Matrices and Extraversion. 


368 Research Notes 


METHOD 
Subjects 
The subjects were 48 12-13-year-old children from the same comprehensive school as used 
in Experiment One, but the sample was different and was drawn from 83 children in the top 
ability band of the second year in the manner described below. 


Materials and procedure 
Raven’s Standard Matrices and the Junior Eysenck Personality Inventory were given to 
the subjects in groups. 


There were two language tests, each test being of one hour and presented in written form: 


(i) Comprehension. A 255-word passage in French followed by 16 questions in French to 
be answered in French was constructed. The questions were designed to test the following: (a) 
accuracy in singular and plural use (5 questions); (b) accuracy of use of set-phrases (5 
questions); (c) accuracy in the use of verbs (6 questions). A maximum of two marks was given 
for each question. One mark was awarded for correct comprehension and from the remaining 
mark half a point was deducted for each mistake in accuracy. The score for each skill type was 
then expressed as a percentage to allow comparison between skills. 


(ii) Essay. The pupils were instructed to write a composition in French of between 50-70 
words on one of three themes: (a) La famille, (b) L’école, (c) Mes amis. For each theme they 
were given 15-16 questions in English to guide them and the final title also had a set of six 
pictures. Essays were scored for use and accuracy in three skills: (i) words dependent on the 
rules of singular, plural, masculine, feminine and vowel. The first 15 uses of singular and 
plural were examined. One point was given for correct usage. Half a point was deducted for 
accuracy errors; (ii) ‘‘set phrases" which, if subject to change, were dependent on simple 
operations such as word substitution (e.g., П est cing heures. Il est sept heures). The first 10 
uses of ‘‘set phrases" were examined. One point was given for correct usage. Half a mark was 
deducted for accuracy errors; (iii) the ability to handle verbs and their rules for change. The 
first 10 uses of verbs were examined. One point was awarded for correct usage. Half a mark 
was deducted for accuracy errors. 


The subjects were grouped high and low according to their Raven’s and Extraversion 
scores within sexes. Excess subjects were randomly excluded to give eight cells with six 
subjects in each. 


RESULTS 
The mean scores for the groups are given in Table 2. 


А five-way analysis of variance of Extraversion х Raven's x Sex with repeated measures 
on Task Type and Skill Type was performed on the data and this indicated three significant 
interactions requiring consideration: (a) a significant sex difference (Е = 4:52; df 1,40; P < 
0-05; (b) an interaction between Raven’s extraversion and skill type (Е = 5-41; df 2,200; P< 
0-01); (c) all three skills were handled better in the essay test (F — 42:69; df 2,200; P « 0-001). 
These three findings will be considered in turn: 


(1) Sex difference 

Girls again did better than boys. The means were respectively, overall girls 43: 17 per cent, 
boys 36:17 per cent. This finding is in accord with the previous one and again no reliable 
interaction was found between sex and the other variables. 


(2) Raven's Matrices, extraversion and skill type 

The pattern of results is shown in the lower section of Table 2. The difference between 
extraverts and introverts for the high Raven's Matrices score groups was greatest in the case of 
the verbs. For the low group the differences were fairly constant and in favour of the 
extraverts. 


(3) Skill type and test type 

In the essay questions subjects scored higher in all three skills than in the comprehension. 
However, since there is no way of equating the general difficulties of two dissimilar tasks the 
finding should probably be disregarded. 
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TABLE 2 


MEAN SCORES ON LANGUAGE TESTS IN EXPERIMENT 2 
(SSTANDARD DEVIATIONS IN BRACKETS} 


Mean score on language tests 





(A 
Essa 
Sex |Raven's | Extraversion Rules SetPh Verbs Rules SetPh Verbs 
Boys | High Extraverts 44-16 11-66 14°16 68-33 62:50 30-83 
(10-50) (8-97) (11:33) (14-62) (19-94) (17-89) 
Introverts 29:16 10:83 32°50 55-83 60-00 40-00 
(22-35) (12:72) (26:88) (8°85) (12:85) (22:36) 
Low Extraverts 29.17 11:67 36-67 41:67 59:17 40:00 
(15-39) (9-34) (24-61) (21:73) (16-69) (12:91) 
Introverts 30-00 8-33 25-00 50-00 50-00 26:66 
(16-32) (13-12) (28-86) (24-15) (22:91) (15:80) 
Girls | High Extraverts 53:33 19-16 32-50 70-83 60-00 42:50 
(20:34) (27-14) (22:68) (16:18) (24-32) (16:00) 
Introverts 46-66 17-50 36-66 53-33 60-83 26-66 
(15:45) (11:08) (17-24) (13-74) (16:43) (11:42) 
Low Extraverts 63-30 33-33 34°16 76-66 77-50 35-00 
(25-40) (25-11) (25-72) (14:33) (14-36) (13-84) 
Introverts 35-00 10-00 25:83 45-00 63-33 25.00 
(29:72) (8-61) (13-66) (18:02) (22:66) (12:90) 
вә] Both tasks 

SetPh Verbs 

Extraverts 30-00 

Introverts * 33-96 

Difference s —3-96 

Extraverts 36-46 

Introverts * s 25-62 

Difference 10-84 





DISCUSSION 


With respect to sex differences, both studies showed a clear overall superiority of girls 
and no hint of interaction with task type. 


In the first study, as the amount of English to French translation increased so the 
superiority of the extraverts increased for the high Raven's score group with the opposite 
result for the low group. In the second, extraverts were again better than introverts 
particularly on skills requiring attention to the phonetic rather than the semantic 
characteristics of words. Taken together the two studies provide tentative evidence that 
performance in second language learning is related to personality which may in turn be related 
to learning styles. Generally the most appropriate combination is extravert and high Raven's 
score, although a low Raven's score appears to be efficient for some aspects of language work 
which require an overall view. It may be that a low Raven's score is related to a holist 
approach to learning. Additional work is required to replicate with findings with larger 
samples and to see how the relationships between learning style and language learning vary 
with the stage of learning. For instance, would the results found in the present paper also 
apply at GCSE and GCE A-level? Further, do the patterns of results found for French apply 
to other languages such as German and Spanish? 


Requests for reprints should be addressed to Dr. Richard Riding, Department of Educational 
Psychology, Faculty of Education, University of Birmingham, Ring Road North, P.O. Box 363, 
Birmingham, B15 2TT. 
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ESSAY REVIEW 


BnapLEv, L., and Bryant, Р. (1985). Rhyme and Reason in Spelling. Ann Arbor: 
University of Michigan Press, pp. 112, $7-95. 


If in March, 1978, you had asked a group of infant teachers to write down what they 
knew about the teaching of reading on the back of an envelope, and then to address the front 
of it to Doctors Bradley and Bryant, just as they were embarking on their substantial 
longitudinal study of children learning to read, a lot of time, energy, money, paper, statistics 
(theirs) and bad temper (mine) would have been saved. Because all infant teachers, who spend 
much of their working lives teaching young children to read, know one or two inescapable 
truths about the process, and about the children involved in it, that Drs. Bradley and Bryant 
seem determined to exclude from their philosophy. Even on the back of a postage stamp, 
teachers who’d got wind of the basic premise and hypothesis of the study could have spelled 
out some basic principles of their own that might have acted as awful warnings: learning to 
read is complex, untidy and unpredictable; it is not mono-causal; normal children learn at 
different ages; there’s more to reading than reading test results; reading and spelling are very 
different and cannot be conflated. 


However, the envelope was never written or posted, and the project got under way. ‘‘Five 
rather gruelling years later . . . a cast of millions” has analysed the massive data, and the 
‘project has done three things". These ‘‘things’’ have to be quoted in full, in the authors’ 
own words, as I find it impossible to paraphrase them, or to quote selectively without a 
scattering of ‘‘sic’’s. So: 


“Our project . . . has established a specific causal link between a preschool skill and 
reading and spelling. It has shown one effective way of preceding success in reading and 
spelling, independent of intelligence. It has suggested ways not only of predicting how 
1 ое might do, but also of helping them over the barriers that our tests seem to 
identify." 


Well, we shall see. 


The research design itself is an interesting combination of well-tried methods. A 
longitudinal study of 403 4- and 5-year-olds was combined with a training study of 65 of them. 
The large group was tested on a number of tasks designed to measure their ability to detect 
rhyme and alliteration in strings of isolated words; and the small group was given training in 
these skills in an attempt to increase their ability in reading and spelling, as measured across 
the whole group at the end of the study by a number of standardised tests. 


In a chapter immodestly titled “А Design For All Causes", the authors argue that this 
combined design can establish, once and for all, causal links that neither longitudinal 
correlations or training studies can do on their own. Even if this were so for some causes, there 
must be a good deal more caution applied to the conclusions when the subject is young 
children learning to read. For, as our expert witnesses scratching away on the back of a second 
envelope will remind us, learning to read is not an identical experience for all children. 


Bradley and Bryant seem to be working on a misplaced culinary metaphor: in the kitchen, 
agreed, if you duplicate the ingredients and synchronise your ovens, you achieve two identical 
cakes. But life isn't like that in the primary classroom. It is the height of naivety to assume that 
there should be a single predictor of reading success, given the variation between individuals 
and the pathways they follow in learning. 


All teachers, however experienced, however good at predicting success and failure, still 
get surprises. In fact, as soon as the authors can tear themselves away from their multiple 
regressions, and look at individual children, they concede that this is so. Discussing Tom, a 
child who they predicted, wrongly, would fare very well academically, they are forced to 
conclude: ‘‘There do seem to be other good reasons for his failure, and his case serves to 
illustrate that there is more to reading than simply sound categorisation’’. 


But this statement only appears eleven pages from the end of the book. Everything up to 
that point has been based on the opposite argument, if you can dignify by that term a position 
that 1s asserted, rather than established through evidence. The authors believe, and state with 
monotonous regularity, that ‘‘success at learning to read and write . . . depends on breaking 
words and syllables into phonological segments”. Even teachers who have not been through 
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the road to Damascus experience of reading Frank Smith and Jill Bennett know that this isn’t 
so; even teachers who haven’t studied Margaret Meek’s Achieving Literacy, an account of 
non-reading adolescents whose very skill in phonological analysis was at the heart of their 
failure to read, know that this isn’t so. For every child learning to read who takes kindly to 
“sounding-it-out’’, there is one who blithely ignores the helping hand of phonics. What is 
more, the authors seem blissfully unaware that there are other ways to approach and define 
the process of reading: the bibliography is woefully bare of any of the writers (Goodman, 
Clay, Bissex, Clarke, Bettelheim and Smith, to name but a few) whose alternative perspectives 
could at least have been acknowledged. 


There is another equally glaring conceptual error that undermines the whole enterprise. 
Blazoned on the front cover, the words ‘‘Reading and Spelling” appear together for the first 
time — but not the last. The entire research programme seems to have been written around a 
shotgun marriage between these two quite separate processes. ‘‘Reading and Spelling" is 
treated throughout the book as if it were one concept, one activity, one piece of learning. You 
don’t even have to be an infant teacher to know that this is nonsense, and that the differences 
between reading and spelling are far more important here than the connections between them. 


Even if the authors had precisely distinguished between reading and spelling, and the 
processes involved in each, and even if they had recognised the difference between reading 
isolated words, and reading in its common-sense meaning, there would still be no excuse for 
the other howlers that they perpetrate elsewhere. For example, they seem to have worked on 
the assumption that in the ideal world, all children will have a reading age equal to or greater 
than their chronological age. They are certainly upset that not all the children in their training 
study achieved this happy state. 


But that’s not what a reading age is. ‘‘Reading Ages’’ are deliberately constructed so that 
half the children in a particular age group will score below the mark and half of them above. 
And this muddle about the very nature of the scores they are analysing leads the authors into 
further error when they turn to individuals. To say of David, aged 8:6 at the end of the study, 
scoring 7:11 on Neale and 8*0 on Schonell (the authors make no comment on their choice of 
tests, and so neither shall I) that ‘‘in effect he had already fallen a year behind in reading", 
and to describe him as a ''reading failure", is, at one level, emotional claptrap, and, at 
another, ignorance of what reading failure really is. 


The heart of the book is Bradley and Bryant's insistence that low ability on isolated word 
tests of sound categorisation can act as a predictor of eventual learning difficulties. They seem 
to have had lessons in assertiveness from the Bellman (**What I tell you three times 1s true") 
but their own facts, especially before they get started on their lengthy statistical 
manipulations, tell a different story. At the beginning of the study, there were 53 high-scorers 
on the sound categorisation tests, and 25 low-scorers (in addition to the children selected for 
training). Only 16 out of the 53 (or, as Bradley and Bryant put it ''rather few’’) became good 
readers; only 7 out of the 25 low«scorers became poor readers. A most untypical note of 
caution edges its way оп to the page at this point: “Ме have to face the fact that our sound 
categorisation test on its own may not be a particularly effective way of predicting eventual 
learning difficulties". But why give up so easily? Two pages later, hope is in sight: ''We had 
after all set ourselves a very difficult task’’, and confidence is returning: ‘‘Even at their 
weakest point as predictors, our sound categorisation measures do better than no measures at 
а”. (Did they ever consider asking the teachers to predict? Silly question.) And by the time 
two more pages of excuse and explanation have passed, we are back to the original position: 
“It is gratifying that the Nursery group's scores were so good at telling whether they would 
become unusually good readers". Once again, the facts and figures are at odds with the 
authors' claims for them. Fifteen children in the nursery group scored high on the initial tests; 
seven of these later became ''unusually good readers'' scoring one sd higher than the rest of 
the group on Neale. Seven children scored low at the beginning of the study, and one of these 
became a poor reader, in the authors’ sense of the word. If this is what a “геаПу rather good" 
predictor looks like, they are welcome to it; particularly as you will have noticed, as the 
authors don't seem to have done, that they can't predict which seven out of 15 will succeed, or 
which one out of seven will fail. These, then, are the facts from which the authors conclude, 
“we have discovered a useful addition to ways of predicting educational success''. 


The authors’ other pride and joy, their claim to have found ways of preventing reading 
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failure, is equally unimpressive. Infant teachers worth their salt will not be surprised to hear 
that one group, who were trained using a set of little plastic letters to form words of one 
syllable, learned more about spelling than the other groups (or, to be precise, their spelling 
ages were 13 months ahead of their nearest rivals). They may however be astonished to hear 
what Bryant and Bradley make of this: “е have established a powerful educational 
technique". Another item for the teachers to squeeze into one remaining corner of their 
envelope: “We don't know everything, but we do know some things”. 


Lastly we must look at a question that casts its shadow over the whole project: the ethics 
of the experimental part of the study. The authors selected their group for training from the 
lower end of their original tests; these 65 children were, therefore, according to the hypothesis, 
likely to achieve lower reading and spelling levels than the rest of the group. They were 
assigned to four training groups; two were trained in a way that was predicted to raise their 
Scores on various tests two years later; one group was given training that was predicted to have 
no effect on their scores, and one group was given no training at all (they may well have been 
the lucky ones). In effect, the authors said to the teachers: here are 65 children, who, we 
predict, have a higher than usual chance of failing at reading; so we are going to give 32 of 
them extra help, the rest, nothing. At least, they can't have said anything of the sort to the 
teachers, or I suspect they would have been shown the door. And I wonder how they put this 
procedure over to the children's parents? 


As it was, I fear the children in the group who received the irrelevant training fared the 
worst, in terms of what they had to endure during the study, never mind their reading ages. 
They were trained to categorise by concept, instead of by sound, the same archaic little 
pictures as the other two groups; each individual child in the group, working alone with the 
experimenter, was taught how to group together pictures of a bat, pig, rat, and hen as 
animals; hen, man, leg, and dog as living things; and bag, bed, cot and box as ‘‘found 
indoors". If you find this uninspiring or trivial to read about, imagine the feelings of the 
children who received this training for 40 sessions, spread over ‘eighteen months! 


If these are the deeds that have to be done to achieve such pitiful results, then it is no 
wonder that some teachers distrust educational research, and all its works and pomp. And this 
distrust may well be a healthy symptom, if it encourages teachers to justify their judgments 
and organise their knowledge and understanding rather better than some of their more 
academic colleagues. 

MARY JANE DRUMMOND 


BOOK REVIEWS 


Baron, J. (1985). Rationality and Intelligence. Cambridge: Cambridge University 
Press, pp. 299, £25-00 hbk. 


Jonathan Baron’s ‘‘aim is to define a conception of rational thinking that can serve as a 
basis for the teaching of good thinking" (p. 2). His ambitious text deserves to be read by 
anyone who wants to be clear about philosophical, psychological and educational aspects of 
an adequate theory of rational thinking (and preferentially by anyone who has all three 
interests). Even if there is objection to Baron's position in any one of these three respects — 
readers who are irritated by this or that detail of Baron's position may take their turn! — there 
is no doubt that the position is important just because it has this threefold basis. 


Baron draws a sharp distinction between cognitive capacities (such as discriminative 
speed or working memory), which may affect success on tasks but which are not open to 
instruction, and dispositions, which may also affect task success but which are open to 
instruction. Using this distinction, rationality is to be distinguished from intelligence, since the 
former is defined as a sub-set of the latter. Specifically, intelligence is the set of properties 
which make for effectiveness, that is, the achievement of rationally chosen goals which are 
difficult to achieve. Both capacities and dispositions affect the effectiveness of intelligent 
thinking. By contrast, rationality is that sub-set of properties which are dispositional in 
character and applicable to the methods or rules chosen in thinking, not to the outcomes of 
thinking as such. A person is rational if that person, placed in some environment and endowed 
with some cognitive capacities, arrives at conclusions on the basis of means (methods, rules) 
which are better than alternatives (pp. 14-16). 


So construed, rationality is attributable to the making of a decision or to the formation of 
belief (pp. 206-207). Baron thus refers to rationality in a comparative sense; for example, to 
say that a person is irrational is to say that the person could have used a ‘‘better’’ (p. 5) 
method than the one in fact used. Yet Baron also states that rationality is (and not merely is 
analogous to) a moral trait such as honesty since rational thinking is under a person's control 
and irrational thinking is worthy of censure (pp. 234-236). In consequence, rationality is 
construed in an absolute sense. But a person, who is honest (rational), is so, period. The 
person is not merely more honest or more rational, not even by consideration of the rule used 
in being honest or rational. Thus even if irrationality is given a comparative construal, it does 
not follow that rationality may be similarly construed. This in turn is a reflection of the 
distinction between rationality qua rational method and rationality qua rational person. 


Baron's theory illustrates the normative, prescriptive and descriptive elements of a theory 
of rational thinking. A utility theory of rational choice is normative in character (see chapter 
2). A descriptive model of rational thinking is concerned with phases or moves: ‘‘Goals 
determine when thinking is successful, possibilities are potential satisfiers of goals, and 
evidence is used to evaluate possibilities" (p. 87). Protocol-analysis is recommended in the 
investigation of moves in thinking, moves which ‘“‘might’’ (p. 102) correspond to actual 
thinking. The prescriptive aspects are a model which, if followed, leads to rational thinking. 
Baron's review of relevant educational issues is both selective and inadequate. There is evident 
bias in this review — in favour of psychological at the expense of educational research and in 
favour of research in cognitive at the expense of developmental psychology. (Test: why is the 
work of Robbie Case by-passed?) There is in consequence some doubt as to whether the 
review in chapter 7 can serve as a basis for teaching at all. 


To sum up: Jonathan Baron has illustrated a general position which we should all accept, 
even if this or that detail of his position is unconvincing. Accepting his call for inter- 
disciplinary collaboration (p. viii), let us in future show concern only for theory with both 
normative and descriptive and prescriptive elements! 

LESLIE SMITH 


Fontana, D. (1985). Classroom Control. London: Methuen, pp. 186, £4-95 pbk, 
£11:95 hbk. 


Inability to control a class is the single most common reason for failure both on teaching 
practice and during the probationary year. This thorough and wide-ranging psychological 


374 


Book Reviews 375 


perspective, published jointly by Methuen and the BPS, is a valuable contribution to the field. 
The book is written in two parts: the first covers the causes of problems, the second considers 
strategies which teachers might use to improve their competence in class management. 


Fontana uses the word ''control" rather than ‘‘discipline’’ because of the political 
connotations of the latter. It is an inescapable fact of life nowadays that political aspects of 
education cannot be ignored, but the author is right to concentrate on psychological insights. 


In the first part of the book there is a detailed analysis of likely pupil behaviour which 
may lead to problems of control. These include age-related matters, such as social 
relationships at different stages of development, perception of teachers and sheer physical 
size, as well as factors influenced by ability, socio-economic status and ethnic background. 
The vantage point of both pupils and teachers is considered, and pupils’ self-esteem, personal 
adjustment as well as Eysenck-inspired affective aspects, such as extraversion-introversion 
and neuroticism-stability, are discussed. 


The majority of the book is devoted to teachers’ strategies, and part two gives three 
chapters to behavioural, cognitive and managerial approaches, with two further chapters 
devoted to their perceptions and management of themselves. Behavioural approaches arouse 
the ire of many practitioners, and Fontana’s treatment of both the techniques and underlying 
issues is very fair-minded. Topics such as reinforcement, shaping, classroom rewards and 
punishments are covered, and the views of critics are answered in a question and answer 
format. There is a section on performance contracting, and I should have preferred here a 
fuller discussion on the question of the pupil’s role in negotiation, because some forms of 
contract are as one-signatory and authoritarian as the form of behaviour modification they 
are meant to improve. 


In the section on cognitive approaches there is quite proper emphasis on pupils’ 
understanding. Again critics of the rational approach to class management, which tends to 
stress clarity of explanation, the setting of attainable objectives, pre-planned variation of 
teaching mode, and the avoidance of unnecessary or impractical threats, see this as a teacher- 
dominated authoritarian view of classroom life. Fontana emphasises the need for pupils to 
understand and indeed be party to these processes. 


The most practical chapter, from the teacher’s point of view, is the one on actual class 
management techniques. Like the rest of the book this section is based not on the author’s 
own empirical research, but rather on a mixture of commonly reported research findings and 
sound commonsense. The list of sub-headings is familiar and the advice is sound: be punctual, 
be prepared, insist on co-operation (easier said than done sometimes), use the voice 
effectively, be alert to classroom events, analyse what you do, anticipate, where possible, 
crises, mark pupils’ work fairly, keep promises and so on. 


This is a book written by a psychologist for practising teachers and others working in 
education. It is a more than ample summary of the field and is written in a style which makes it 
enjoyable as well as authoritative. Effective classroom control is an important pre-requisite 
for many of the other skills which teachers need in their increasingly demanding job. Young 
teachers, those seeking to hone their competence, and anyone working with teachers 
experiencing problems will find Fontana’s book a valuable source of information. 

TED WRAGG 


Freeman, N. H., and Cox, M. V. (1985). Visual Order: The Nature and 
Development of Pictorial Representation. Cambridge: Cambridge University 
Press, рр. 393, #27-50 hbk. 


Drawing has become a popular field of developmental research within the past few years. 
The reasons for this popularity are not difficult to find. Children's drawings seem to reveal 
global misconceptions about graphic representation, promising clues about general 
intellectual development, as well as a number of “‘little local difficulties" that may help us to 
understand the young child's perceptual world. The best example of the first is so-called 
“intellectual realism" — the tendency which young children have to produce a good, explicit 
view of an array when we ask them to draw their own view of it. Its close relative is 
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“symbolism” whereby children draw a prototype object (e.g., ‘‘the house" complete with 
smoking chimney, path, and four equal-sized corner windows). In both cases they appear to 
be drawing ‘‘what they know” not what they see. The best example of a local difficulty is the 

“perpendicular bias” in which chimney pots are drawn at right-angles to roofs, trees at right- 
angles to hill-sides, and liquids at right-angles to tilted glasses. 


This handsomely produced, well-edited volume of 20 chapters provides an excellent state- 
of-the-art picture of research into the state of children's art. The editing is not ‘‘passive’? — 
the introduction is thorough and useful and the editors pop up along the way with helpful 
commentaries. The reader does not, despite fears, finish the book knowing more about 
children's drawing than he ever wanted to because too many questions and uncertainties 
remain. If, however, he travels cover-to-cover, reviewer-style, he may well finish it feeling a 
little muddle-headed. In the case of this reviewer, muddle-headedness began to set in after 
page 287 when the editors listed seven (‘‘and so on") options for characterising how pre- 
schoolers may possess both object-centred and viewer-centred ‘‘modes’’. 


Taking a broad view, the strengths and weaknesses of the book are easy to see because 
they are the strengths and weaknesses of the state of drawing research itself. The main strength 
lies in the richness of the data and the inventiveness — and charm — of many of the 
experimental studies. There does not appear to be a significant British drawing researcher 
missing from the line-up (and the best research in this area is British): Barrett, Bremner, 
Crook, Davis, Light, Pratt, Phillips, and Willats are all there as well as the co-editor, Maureen 
Cox — although there is no chapter by Norman Freeman. They all produce thoughtful essays, 
although in some cases not going far enough beyond their previously published work. There 
are some interesting, though less familiar, figures, such as the late R. K. Duthie on 
adolescents’ drawing. A host of important questions are raised by Lorna Self’s chapter on the 
amazing drawing skills of some subnormal and autistic children. Do such islets of photo- 
realism make the association between general intelligence and drawing ability look shaky, or 
should the rarity of such skills make us reluctant to draw any strong conclusions? 


Turning now to the weaknesses of current drawing research, these make their appearance 
in the present volume in two forms: the unsatisfactoriness of the attempts to locate drawing 
development within theories of perception, and the underdeveloped state of cognitive 
accounts of intellectual realism. In the first case, it is not at all clear that either Gibson or Marr 
have anything to teach us. In his scholarly treatment of Gibson, Alan Costall is frank about 
the incomplete nature of Gibson’s account of pictorial representation (though it is not very 
easy to see why he thinks that the notion of ‘‘conventionality”’ is going to make it any more 
complete). As regards Marr, it is perhaps a muddling of explanatory levels to parallel stages of 
neural representation in the visual system with phases of graphic representation — as does 
Willats. 


Finally, there is the difficulty of locating the general cognitive significance of intellectual 
realism. There is a choice here. On the one hand, we can unpack the phenomenon of 
intellectual realism via a host of experimental manipulations showing that young children 
“һауе access" to a viewer-centred mode — sometimes. Or we can argue: although the 
tendency to intellectual realism is variable and situation-sensitive (what else would you expect, 
by the way?) it is definitely there, so let us try to account for it in the light of general theories 
of cognition rather than by fine-grain accounts of drawing. The former approach, when taken 
to its ultimate conclusion, must give us nativism. The latter approach is only available here in 
the chapter by Charles Crook. He argues that children find viewer-centred representation 
difficult and unnatural because they are over-determined by their **mental models’’ of scenes. 
He reports some experiments that seem to favour this account of realism as against the view 
that children are realists out of a need to convey information. Despite the use of the shibboleth 
term ‘‘mental model", this looks like one kind of suggestion that could lead to useful theories 
of what children's drawings tell us about children's mental life. JAMES RUSSPLL 


Hanko, С. (1985). Special Needs in Ordinary Classrooms. Oxford: Blackwell, 
pp. 166, £6-50 pbk. 


The title of this book is misleading. The content is concerned with the organisation and 
development of teacher centred, professionally serviced, support groups. It is written 


Book Reviews 377 


specifically for teachers and professional ‘‘consultants’’ who have responsibilities in the 
ordinary schools for ‘‘special needs" children. ‘‘Special needs’’, in this book, refers to 
children who are emotionally and/or behaviourally impaired. 


The author works within four powerful frameworks: 


(1) that professionals have failed to give teachers usable support for E.B.I. pupils; 

(2) that the range of disturbed and disturbing behaviours in pupils is best understood as 
the product of the pupil's interactions with his class group and teachers; 

(3) that practising ''grass roots" teachers provide the most useful support for each 
other; 

(4) that teacher groups can manipulate the professionals into giving relevant additional 
support. 


Teachers working with E.B.I. pupils are seen by the author to lack knowledge and under- 
standing of the aetiology of maladjustment, lack objectivity in observing problem behaviours, 
lack confidence in their own judgments and skill in modifying the '*normal" curriculum. 
Teachers are subject to increased stress, loss of job satisfaction and professional self-esteem. 
The pupils are seen as unsuitable for ordinary school education. 


Creating teacher support groups working within a psycho-dynamic model of non- 
directive intervention helps teachers overcome their difficulties by ‘‘recognising, building on 
and appreciating teachers' existing strengths and expertise", without removing the teacher's 
responsibility for the pupil. Support for these claims is provided by the writer with illustrative 
case study material culled from 30 years of experience, now defended by an extensive and 
recent literature review. 


The model is appealing. The starting point in the support group has to be the individual 
teacher's perception of the problems. These are shared and extended by fellow teachers and 
“taken on” by the attending consultant hopefully into more useful modes of coping for the 
teacher concerned. 


The consultative teacher with responsibility for special needs children in the ordinary 
school will find in this work both a carefully defended rationale for their role in developing 
professional insights in their colleagues and less usefully, operational instructions for doing it. 
In practice the suggested ''triad", case presentation, information gathering and joint 
exploration of the central issues, suggest the need for high levels of managerial and interactive 
skills in all the participants if the group's work is not to become dominated, however subtly, 
by the attending ‘‘consultant’’. 


The consultant teacher needs to be skilled in techniques of group management, have 
access to knowledge and understanding of the aetiology of maladjustment, have access to a 
range of ameliorative teaching strategies and access to the normal curriculum. Commonsense 
suggests these are likely, as in the case of the author, to be built up from both experience in the 
field and research in the literature. Teachers will be disappointed if they expect this book to 
provide a recipe for success. 


The consultative teacher with a responsibility for a normal school population including a 
range of special needs children has to decide how best to support their colleagues. Forming 
and supporting teacher groups, however effective in helping teachers cope with some of their 
most intractable pupils, has to be considered alongside the provision of in-class support and 
peripatetic help for sensory impaired, developmentally delayed and socially disadvantaged 
children. 


Ordinary class teachers with many demands on their limited professional time and energy 
need to see attending a support group as a high priority. Such teachers need to know if support 
groups are effective. The author of this book claims support groups work well, probably when 
the participants are highly committed. Readers seeking a rigorous evaluation will be 
disappointed. 

ELIZABETH GREEN 
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Hecarty, S., and Evans, P. (Eds.) (1985). Research and Evaluation Methods in 
Special Education. Windsor: NFER-Nelson, pp. vii + 176, £10-95 pbk. 


Stemming from papers presented at a symposium supported by the ESRC in March, 
1984, this collection of papers is claimed to introduce the reader to ‘‘two under-used but 
vitally important methodological approaches" which are both ‘‘relatively unfamiliar to 
researchers and practitioners in the field’’. The two areas in question are single subject designs 
and qualitative techniques. Can this really be the case that ''researchers" (not student 
researchers, I assume) are not acquainted with these two methods of research? Or is the reader 
to expect the more sophisticated intricacies of research procedures which are currently being 
devised in these particular fields? As the publishers’ blurb is one of the few which does not 
identify the intended readership with any greater precision than that quoted above, it is not 
clear at what level the book should be adjudged. 


In an opening paper Schindele elaborates in some detail the weaknesses of quantitative 
methods in the field of special education, and makes some pertinent observations about the 
relationship between research and social policy. This is echoed throughout the book as each 
speaker at the conference had introduced his theme (there were no women contributors). 
Whilst this is understandable at a conference, it would have been helpful if the editors could 
have excised some of this to avoid the repetition. 


The first half of the book on single case and small-N research designs includes papers by 
Kiernan on single subject designs, Robson on small-N case studies, and Halil on statistical 
methods in single case studies; and these may be placed on a scale of increasing sophistication. 
Kiernan's paper applies the familiar premises of Campbell and Stanley to the unique problems 
of the single subject design in a way which is clear and basic. It is therefore usefully placed 
near the beginning of the book. It is perhaps least satisfactory in its treatment of the 
generalisability of results wherein he argues that ‘‘such studies are valid in themselves from a 
clinical viewpoint and can provide crucial insights into possible generalisable statements” (р. 
43). However, such a concluding statement hardly seems to follow from the body of the text, 
and the reader is left in some uncertainty as to just how it can fulfil that latter requirement. 


Robson's paper sagely argues for a rapprochement between qualitative and quantitative 
designs for small-N studies. He then goes on to a thorough and systematic analysis of issues of 
validity, emerging with a conclusion that such studies best imbue confidence when they are 
repeated. Though how repetition is to be achieved amidst a welter of unspecified variables is 
not terribly clear. 


Many of us whose origins lie in the mists of positivist experimental psychology feel 
decidedly queasy when drawing conclusions оп the basis of ‘‘eyeballing’’ time sequence 
graphs. Halil provides the antidote to this condition in a detailed explanation of ARIMA 
(auto regressive integrated moving averages). This procedure not only interprets the statistical 
significance of an intervention in a time series, but can be used to give information on the 
form of the effect, abrupt or gradual, permanent or temporary, and the chapter finishes by 
listing the standard packages which include this facility. 


The second half of the book deals with qualitative methods, and most of the basic concepts 
have been available in standard texts for some time. Corrie and Zaklukiewicz provide an 
introduction to qualitative research and case study approaches, indicating that in contrast to 
the ‘‘physical science models", these offer ‘‘descriptive and interpretive methods in order to 
provide a much wider range of information" (р. 122), and ''allow a sufficiently detailed and 
accurate picture of the processes of special education to be built ир” (p. 124) in terms of “а 
realistic account of the world as participants experience it” (p. 135). Unfortunately terms such 
as ''sufficiently detailed", ‘‘accurate’’, and ‘‘realistic’’ are not unpacked to see what they 
imply. Whilst not denying the usefulness of the methods they recommend, there is within this 
chapter a degree of fervour which glosses over some of the problems. 


In this section Adelman writes on action research, giving examples of some in which he 
has been involved, and Hammersley gives a thoughtful and personal view on what he 
considers ethnography to be. This is a particularly valuable contribution in that it points up 
some of the difficulties of this approach, and the dangers of perceiving methodologies as 
distinct and conflicting paradigms. He sensibly points out that the commonsense notions 
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which underlie the principles of Campbell and Stanley’s work cannot be suspended when one 
adopts these alternate methods, and to do so simply results in bad research. 


The levels of contribution in this book are curiously varied. I doubt that professional 
researchers will find much of the content particularly enlightening. For my part, I would see it 
as a very useful book to give to students who are about to embark on some investigation of 
their own in the area of special needs. Those who might have been constrained by the 
orthodox experimental approach may well be relieved to find that other methods are also 
legitimate. 

GEOFF BROWN 
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Lvwcu, J. (1986). Multicultural Education: Principles and Practice. London: 
Routledge and Kegan Paul, pp. 230, £8-95 pbk., £17-95 hbk. 


This book is intended for ‘‘teachers and students in training’’ to provide ‘‘an 
introduction to the field of multicultural and anti-racist education’? and to present a 
**practical approach’’. 


Lynch defines multicultural education as that education provided for all children, 
including those from ‘‘old-established and indigent (sic) minorities" , which aims at ‘‘a higher 
stage of cultural competence and sensitivity’’, ‘‘rationality, respect for persons, commitment 
to discourse for the resolution of conflict, encouragement for human emancipation and 
freedom . . . acceptance of equality before the law and mutuality of instrumental regulation, 
acceptance of negotiated values and structures as also of the existence of some separate 
structures and commitment to political unity at the level of the nation state, combined with 
engagement for the above values globally’’. Such principles have traditionally been used to 
justify education in a liberal democracy; each presents complex dilemmas of policy and 
practice; to provide an introduction to the principles and practice of education in this sense is 
an ambitious undertaking. 


Lynch’s general approach may be seen in the following: ''If the ‘rules for life’, which 
Schermerhorn uses as a shorthand for culture, are lacking in sufficient overlap between 
cultural groups, as would be the case in cultural encapsulation, then the crystallised 
relationships, which Schermerhorn proposes as relating people to the major institutional 
activities of society, will be conducted on the basis of different rules and understandings. 
Inevitably conflict in society is thereby accentuated and existing legitimation crises are 
inflamed and sharpened for as long as discourse is not facilitated to attentuate them.” For this 
reader, the style of this attracts attention to itself and away from the important point which is 
being made. 


Lynch provides useful overviews of policy debates in the UK and elsewhere: in USA, for 
example, whether multicultural education should include “‘ethnic, credal, social class, sex and 
language as well as racial and handicapped considerations"; and in Australia where he 
approves Bullivant’s recognition that ‘‘racial, cultural, ethnic, class and gender differences are 
being adopted as boundary markers in processes of exclusion and inclusion which are 
overlapping and multifaceted means of attempting to maximise social rewards and economic 
resources". 


In “teaching multicultural education’? — an odd expression — we need to bear in mind 
differences in field-independence, field-dependence, and locus of control, and ‘‘remember 
that culture is also introjected into non-verbal patterns of behaviour’’. Lynch suggests that in 
practice among possible methods ‘‘may be variable use of different approaches including oral 
and visual, verbal and non-verbal stimuli, a broad selection of cultural artefacts, exemplars 
and content, different learning experiences for the children including peer group tutoring, co- 
operative group learning and graduated approaches to skill requirements, based on diagnostic 
and reflexive progression by the child which emphasises and rewards success across a spectrum 
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of different concepts of what reward might be’’. It is not clear how helpful such a list could 
be, partly because it is not clear which approaches, if any, are being rejected, and partly 
because of the lack of guidance as to how and why teachers might choose among them and 
apply them. It is here that detailed evaluation of classroom practice is desperately needed. 


There are other sections in the book: on school, parents and community, recommending 
effective dialogue; on assessment, calling for fairness in testing, and ‘‘multiple indicators 
which can be reviewed speedily, accurately and holistically’’; on staff development by means 
of the ‘‘scanning of institutional and personal needs and the interfacing of this process with 
discourse and interaction with the community". 


Lynch concludes by recommending that ‘ће complex and interrelated nexus of policies” 
should be ‘‘eased from theory into the workshop of professional practice". It is difficult to 
disagree, as it is difficult to disagree with much else in the book. This reader, however, would 
have welcomed a more straightforward style and a greater sense of the practices from which 
the theories derive and which they might be used to inform. The book deals with important 
issues. It contains helpful checklists, and a comprehensive range of ideas for discussion by 
those concerned with the professional development of teachers. However, more concessions 
to its intended readers would have made it a more useful contribution to the dialogue about 
professional practice which it so strongly recommends. 

IAN BLISS 


Reynos, C. R., and Wittson, V. L. (Eds.) (1985). Methodological and Statistical 
Advances in the Study of Individual Differences. New York: Plenum Press, pp. 
472, $54-00 pbk. 


I have to declare from the outset that because of some ambiguity in the title of this book, 
I have ended up commenting on material with which I have limited familiarity. The 
juxtaposition of the terms ‘‘methodological’’ and ‘‘statistical’’ in the title seems to imply a 
treatment of issues relating to behavioural research methods. Whilst four of the eleven 
chapters do cover statistical and assessment procedures, the bulk of the text is concerned with 
problems of experimental psychology. The first three chapters (comprising over one-third of 
the book) present detailed accounts of EEG related measures (by Eysenck and Barrett), of 
reaction time studies (by Jensen) and of neurophysiological approaches (by Hiscock and 
Mackay). Four more chapters are more cognitive in orientation, looking at componential 
theory (Sternberg), aptitude-treatment-interactions (Phillips), applied behaviour analysis 
(Kratochwill, Mace and Mott) and at the Kaufman Assessment Battery (Kaufman et а). Only 
the four remaining chapters fall naturally within the common understanding of statistical 
advances. One, by Hung and Walberg, is a useful introductory account of general linear 
models, another discusses procedures for the analysis of interactions (Willson is the author), 
and the other two describe criterion referenced assessment (by Hambleton) and path- 
referenced assessment (Bergan, Stone and Feld). There is the briefest of editorial prefaces 
(three pages) with no introductory or linking comment elsewhere in the book. 


The levels of presentation are very variable. The three neurologically orientated chapters 
would have little appeal to most educational researchers. From my naive position I suspect 
they may represent valuable collations of previous developments and the most fruitful future 
themes. Certainly both Eysenck and Barrett's, and Jensen's chapter offer convincingly strong 
evidence of the power of EEG and RT measures as indicators of intelligence. Moreover their 
promise is heightened by the absence of useful indications from Hiscock and Mackay’s 
lengthy survey of neuropsychological blind alleys. Some of the chapters are fairly ordinary 
descriptions that have questionable legitimacy in an expensive frontier text. The outline of 
Kaufman's K-ABC test pack is one such; the account of criterion referenced testing is another. 


The remaining half of the book does introduce promising avenues in an accessible 
manner. Hung and Walberg's introduction to general linear models is particularly effective in 
its use of demonstrations to substantiate points. The concluding chapter on path-referenced 
assessment would have benefited from a similar demonstration but, nonetheless, the structural 
analysis it embodies does provide a convincing link between theory and application in the area 
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of individualisation. The chapters on componential theory (an attempt to link psychometric 
and cognitive studies), оп aptitude-treatment-interaction (i.e., the teacher-learner 
interaction), on the analysis of interactions, and on applied behaviour analysis (**emphasises 
the effects of independent events (or variables) on the occurrence of specific behaviours (or 
responses)’’) all provide useful theoretical introductions to approaches that could generate 
new strategies for educational research. 


The high price and the large size of this edited volume must attract some comment. That 
there is a tradition of expensive ‘‘Advances in . . .’” books indicates that there is a market of 
some kind. But there are reservations with this particular text. Firstly, one third of it is so 
different from the rest that it would better have formed a separate (smaller, more accessible) 
book. Secondly, the editorial provision is inadequate in view of the unfamiliarity and 
difficulty of much of the material. Last year, reviewing an edited book in the Bulletin, Peter 
Mortimore commented ‘‘neither they [the authors] nor the editors indicate where one 
theoretical approach may complement or be related to another". He concluded, ‘‘The most 
disappointing feature of the collection, in my opinion, is the lack of cohesion". The same 
might be said of this book. 


As a parting shot I must make an observation I have made directly to publishers. Surely 
the academic market has now reached such a level of saturation that there is negligible real 
desire for this kind of book. Typically what happens is that one borrower takes it from the 
library to read one, two or maybe three chapters. The apathy syndrome endemic in most 
library systems means that it is returned several months later. Over a period of five years I 
doubt whether more than 10 people see the book. Is that really a proper use of library 
resources? In fact we are now in the modular book market. Sage have shown that by 
modularising specialist books you can charge £5 for each chapter and still satisfy the 
commercial needs of the publisher, and the study needs of the student. This volume is an ideal 
candidate for modularisation, particularly since the editorial provision is so inadequate. 


MICK YOUNGMAN 


WILKINSON, A. (1986). The Quality of Writing. Milton Keynes: Open University 
Press, pp. 157, £6-95 pbk. 


Andrew Wilkinson's new book has grown out of his report of the so-called **Crediton 
research" project published as Assessing Language Development (and not as ''Assessing 
Children's Language” as Anthony Adams calls it in the introduction) in 1980. The original 
research sought to verify four complementary models of writing development, affective, 
cognitive, moral and stylistic in 8-13-year-old children, giving overall priority to a 
consideration of the child as a **communicative being". 


In The Quality of Writing, the various features of the models are located within historical 
and theoretical perspectives and Wilkinson argues for greater account to be taken of the 
implications of the research in the assessment of children's writing in schools. There is some 
discussion of the ways in which the models may need to be refined over time and already the 
affective dimension has undergone change. The book is illustrated by many extracts from 
children's writing (although unfortunately not in facsimile form) and includes many points of 
debate and discussion which have emerged in related courses and studies at the University of 
East Anglia. There are numerous signs that the continual work necessary to validate and apply 
the models has been edging gently forward. 


At the same time, the book gives the impression that its underlying assumptions are just a 
little introverted. Alternative models of writing development, such as that which emerged 
from Bereiter's (1980) work in Ontario, are dealt with very briefly and a touch peremptorily. 
While pleading for no more studies which look at just the surface features of writing (such as 
those dealt with by Harpin's (1976) research), it is a pity that Wilkinson does not engage in 
debate with the more sophisticated linguistic approaches to examining children's writing 
provided by Katharine Perera (1984). For such approaches may be of considerable help in 
considering the inherent possibilities and effective constraints in the writing development of 
children for whom standard English is a second language or second dialect and of whom the 
Crediton sample seemed understandably unrepresentative. 
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Perhaps the most noticeable omission is any specific consideration of certain of the 
*'foundations' of written language about which Wilkinson has written so successfully in the 
past. For instance, there 1s no mention of the seminal research in Ontario by Bereiter and 
Scardamalia into the psychological adjustments which children have to make in beginning 
writing, ‘от conversation to composition", a shift which may not be fully resolved until 
after several years of schooling, years which overlap with Wilkinson's focus. A particularly 
significant aspect of the Ontario research has been the demands of writing on the 
*autonomous production of language" without the inputs of a conversational partner, 
involving both ‘‘content knowledge" and ‘‘discourse knowledge" in the composing of 
writing, of which the design of the Crediton research did not allow detailed investigation. 


The Quality of Writing remains, though, an optimistic and inspiring book, in which the 
author's own stylish gifts are readily apparent and his deeply introspective approach to what 
writing is all about is very evident. At a time when the credibility of academics is so frequently 
questioned, it is cheering to be reminded by Anthony Adams that Andrew Wilkinson once 
won the prized Italia Award as a radio playwright. It is therefore surprising that there is no 
mention of another literary academic, Frank Smith and his individually introspective Writing 
and the Writer and regrettable that more attention is not given to those for whom writing 
continues to be painfully frustrating and unrewarding, superbly documented by Mina 
Shaughnessy (1977). 


The Quality of Writing ends with a potpourri of practical ideas for fostering the different 
dimensions of writing development in the classroom, ideas which themselves raise big 
questions about the nature of the curriculum in the middle years. Perhaps now attention needs 
to be given not only to children's writing but also to the circumstances in which it grows and 
flourishes; not only to the ‘‘communicative being", but also to the **communicative context” 


of the classroom. ROGER BEARD 
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CALL FOR PAPERS 


Second European Conference 
for Research on Learning and Instruction 


The second conference of European Association for Research on Learning and 
Instruction (EARLI) will be held in Tübingen, West-Germany, 19th-22nd September 
1987. 


It will be organised by Prof. Dr. Heinz Mandl (University of Tubingen). 


Main themes of the conference are: 

— Learning and instruction in natural settings. 

— Motivational and emotional factors in learning. 

— Kowledge acquisition and problem solving. 

— Concept development and learning. 

— Strategies for effective learning. 

— Diagnosis in learning and instruction. 

— Comprehension and learning from text. 

— Cognitive and social processes in learning with computers. 
— Basic skills in specific domains. 

— Classroom learning processes. 

— Special problems of learning and instruction in ethnic minorities. 


For more information please contact: 


Prof. Dr. Heinz Mandl 
Deutsches Institut fur Fernstudien 
an der Universitat Tübingen 

Bei der Fruchtschranne 6 


D-7400 Tubingen 
Tel. 07071/26028 
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CALL FOR NOMINATIONS 


NCME Award for the Best 
Application of Educational Measurement Technology 


The National Council on Measurement in Education has established three categories of 
awards: 


Category 1. Award for an outstanding dissemination of educational measurement 
concepts to the public. 

Category 2. Award for an outstanding example of the application of educational 
measurement technology to a specific problem. 

Category 3. Award for an outstanding technical contribution to the field of 
educational measurement. 


This year (1986-87) an award will be made only in Category 2. In subsequent years, 
awards will be made in other categories. 


Examples of application problem areas include, but are not limited to, selection or 
classification of students, measuring a hard-to-measure trait, evaluating an educational 
program or product, and integrating testing and learning. Selection criteria are quality 
and innovativeness of the application effort or the positive impact of the application on 
practice. 


To be eligible for this award, the application must have occurred initially during 1984, 
1985, or 1986. One may either nominate their own application work or, with 
permission, someone else's. Those responsible for the application need not be members 
of NCME. To be considered for the award, four copies of a 3-5 page statement 
describing the technology, application area, and products or results of the effort should 
be sent to the Awards Committee Chair. Later, finalists may be requested to submit 
additional information. 


Award nomination forms are available from the Chair, 


Dr. Howard Stoker 
UT, Memphis 

8 South Dunlap 
Memphis, TN 38163 


The deadline for submission is 30th January 1987. 
The award will be presented at the NCME's 1987 annual meeting in Washington, D.C. 
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NOTES TO CONTRIBUTORS 


Manuscripts are accepted on the understanding that they report unpublished work that is not under consideration , 


for publication elsewhere, and that if accepted for the Journal, the work will not be published again in the same form, 
in any language, without the consent of the Editor. 


The usual range of length of papers is from.2,500 to 4,000 words. Papers should be in a clear style and free from 
jargon. Statistical matter (formulae, tables, discussion of techniques, etc.) should be kept to a minimum, so that the 
Journal will be of interest to statistically untrained readers. Such concepts as mean, percentiles, standard deviations, 
correlation, chi-squared, standard error, critical ratio, significance and reliability, regression and factor analysis, can be 
employed in the text and need not be explained. 


Short research notes will also be published, reporting work of two kinds: (i) results of sufficient moment to ment 
publication in advance of a more comprehensive paper; (i) work which substantially confirms or extends existing 
knowledge but which does not justify an extensive paper. 


Contributors can help to avoid delays in publication by observing certain guiding principles: 


1—Papers should be written with the utmost conciseness consistent with clarity. If the editor or a referee takes the 
wrong meaning from or fails to understand a passage, the fault is the author’s. Unless the editor can grasp the meaning 
of a sentence unequivocally, it cannot be assumed that other readers can. 


2—A paper should be written only when a piece of work 15 rounded off. A report of a pilot study, or a series of 
papers on the same subject written as the results come to hand, is unlikely to be accepted for publication. A 
comprehensive paper, particularly if it includes reports of several experiments or analyses, or replications on further 
samples, which give consistent results, will be given favourable consideration. 


3—Experimental reports should generally follow the standard pattern of presentation, e.g.: 


(a) A short summary at the start. 

(b) A statement of aims or objectives and a review of relevant previous work as an introduction to the 
topic—conciseness and strict selection of relevant aspects are particularly important here. 

(c) Aclear statement of the sample, the procedure, the tests, the experimental material, in sufficient detail to permit a 
reader to replicate the study if he so wishes. 

(d) A factual statement of results, without speculative comment or discussion. 

(e) Either a brief list of general conclusions, not repeating results already listed, or a discussion, which should be 
strictly limited to discussing the results (if this 1s necessary) and which should not involve mere recapitulation or 
speculation beyond the scope of the inquiry. 

(f) Acknowledgments. 

(g) References. 


4—The summary (which is often copied by abstracting agencies) should be intelligible without reference to the 
paper. It should give an indication of the aim, scope and results of the inquiry, preferably in about three sentences. It 
should mention the size and nature of the sample, and should not include vague phrases such as ''the results were 
analysed . . .” or “Тһе implications of the findings are discussed . . .". Even for a full-length paper, the summary 
should not exceed 200 words. 


5—Figures and tables should be selected to illustrate or summarise points made in the text. The same data should 
not be reported both in table and figure; tables are generally preferable because they are much more economical. Tables 
should contain selected data: only the most important quantitative results can be reported in full. 


6—Authors should consult a current issue of the Journal in order to follow the conventions applied in the 
Journal—the use of headings, numbering of tables, the absence of footnotes, the impersonal style of reporting, 
omission of dots in standard abbreviations, reference to subjects as children and not as Ss, the lay-out of references and 
so on. The British Psychological Society pamphlet, Suggestions to Contributors, should be consulted. References 
should be cited by author and date in the text, and arranged in alphabetical order in the list of references at the end. All 
measurements should be stated in SI units. The Journal is committed to the use of non-sexist language. 


7—A manuscript should be headed with the title of the paper, the names of the authors and, in brackets, a short 
address identifying the institution where the work was done. Titles should be short. If the title exceeds 45 letters and 
spaces in length, a shortened version, suitable as a running title, should be added. 


8—The manuscript should be typed in double-spacing with wide margins on one side of the paper which is suitable 
for ink corrections and printer's instructions—not duplicated or on flimsy paper. Drawings should be in Indian ink on 
cartridge paper. Manuscripts which require re-typing before they can be handled by the printer have to be sent back to 
the authors. The Journal has no paid staff. 


9—Authors, if they specifically request, may have their papers reviewed blind. 


Authors receive 50 copies of their papers free (delivered about six weeks after publication). Extra copies may be 
had at cost price if the order is given when the proof is returned. 


The Journal will adopt the International System of Units (SI) based on the units: metre, kilogramme, second, 
ampere, Kelvin and candela. Authors who refer to physical measurements in their papers should now use SI units except 
where it would be absurd to do so. Further information about SI units, including conversion tables, is contained in the 
revised Suggestions to Contributors pamphlet issued by the British Psychological Society and obtainable, price £2: 00 
post free, from The British Psychological Society, St. Andrew's House, 48 Princess Road East, Leicester LE] 7DR. 
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A Special Issue of the British Journal of Social Psychology 


Guest Editor: Gun Semin 
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Comments—R. Farr (LSE, University of London), P Drew (York University) 
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Comments—R. Harré (Linacre, Oxford University), К. Fiedler (Giessen University) 
Authors’ Rejoinder 


The Special Issue forms Part 3 of volume 25 (1986) of the British Journal of Social Psychology, edited 
by A S. R. Manstead (University of Manchester) Volume (4 parts) price £40.50 (US$64.50) 


Orders and subscnptions to 
The British Psychological Society ] 


The Distribution Centre, Blackhorse Road, Letchworth, Herts SG6 1HN, UK 
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(Rochampton Institute, London) and Margaret Harris (Birkbeck College, London) and available as 
Part 3 (September 1986). 


Contents 
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Halliday & Julian C. Leslie 
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Discussion: On social interaction as a type of explanation of language development. Stan A. Kuczaj II 


Special price for the single special issue for a limited period (until June 1987 only) £6.95 (US$11.95) 


"The Britisb Journal of Developmental Psychology, edited by Peter Bryant (University of Oxford) is 
published four times per year. Subscription price for volume 4 (1986) £37.00 (US$64.50). 


Orders and subscripuons to: 
The British Psychological Society 
The Distribution Centre, Blackhorse Road, | etehworth, Herts SG6 HIN, UK 
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The British Journal of Social Psychology 
Call for papers 


Special Issue on ‘The Social Origins and Functions of Emotions’ 


Papers are invited for a special issue of the Journal to be published as part of 
Volume 27, 1988. The issue will be devoted to the role of emotions tn social 
psychology, considering both their origins and their roles in social inter- 
action. Papers tackling these matters from perspectives such as develop- 
mental psychology and psychophysiology will be welcome, along with 
submissions from social psychologists. Both empirically based and 
theoretical papers will be considered. All papers submitted will be subject to 
blind refereeing, and authors should prepare their manuscripts accordingly. 
Further information about this special issue may be obtained from the Guest 
Editor. 


Closing date for submissions: 28 February 1987 


Manuscripts should be sent in triplicate, following the usual conventions for 
the Journal, to the Guest Editor’ Dr Hugh Wagner, British Journal of Social 
Psychology; Special Issue, Department of Psychology, University of 
Manchester, Manchester M13 9PL, UK 


The British Journal of Social Psychology, edited by Dr A. S. R. Manstead, is 
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British Journal of Developmental Psychology 
Editor. Peter Bryant, Watts Professor of Psychology, University of Oaford. 


The immediate success of this new journal, launched in 1983, has demonstrated the central 
importance of its subject matter to the present-day study of psychology. The journal publishes 
full-length conceptual, empirical and review papers, as well as brief reports on work in progress, 
in the following arcas: Development during childhood and adolescence; Early infant perceptual, 
cognitive and motor development; Abnormal development—the probleins of handicaps, learning 
difficulties and childhood autism; Educational implications of child development; Parent-child 
interaction; Acquisition of language, Social and moral development; Effects of ageing. 
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around research, it will also be open to the evaluation of educational programmes 


Books will be reviewed and conferences reported One of the four issues each year will be devoted 
to a special theme. 


The special issue in vol 1 will be devoted to Psychology and the Learning of Mathematics Guest 
Editor Gerard Vergnaud (CNRS, France) 
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Philip Seymour applies the theories and methods of experimental 
cognitive psychology to analyse developmental dyslexia. 


He argues that the mental system underlying reading competence 
can be represented as a set of information processors and that these 
components can be investigated by applying the experimental 
methods of cognitive psychology. 


He presents a model of the cognitive system underlying reading 
competence, describes how to set up experimental procedure on 
the micro-processor, and illustrates the application of the model and 
procedure. The analyses he draws demonstrate the existence of 
differences of cognitive functioning among dyslexic subjects. 
0-7100-9841-3 R8 368pp £26.50 
International Library of Psychology 
For further information contact Sally Mussett in the Promotion Department. 
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This controversial book examines the central theme of his philosophy — the view 
that knowledge is possible exclusively in terms of logic and maths, and that it 1s not 
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